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REPEAT SEQUENCES OF THE CA125 GENE AND THEIR USE 
FOR DIAGNOSTIC AND THERAPEUTIC INTERVENTIONS 

CROSS REFERENCE TO RELATED APPLICATIONS 

5 This application claims the benefit of U.S. Provisional Application Serial No. 60/284, 1 75 

filed April 17, 2001 and U.S. Provisional Application Serial No. 60/299,380 filed June 19, 2001, 
which are incorporated by reference in their entirety. 

BACKGROUND OF THE INVENTION 

1 0 The present invention relates generally to the cloning, identification, and expression of 

multiple repeat sequences of the CA125 gene in vitro and, more specifically, to the use of 
recombinant CA125 with epitope binding sites for diagnostic and therapeutic purposes. 
3 CA125 is an antigenic determinant located on the surface of ovarian carcinoma cells with 

S essentially no expression in normal adult ovarian tissue. Elevated in the sera of patients with 
l| ovarian adenocarcinoma, CA1 25 has played a critical role for more than 1 5 years in the 
W management of these patients relative to their response to therapy and also as an indicator of 

i,y 

I " recurrent disease. 

£ J It is well established that CA 1 25 is not uniquely expressed in ovarian carcinoma, but is 

f U also found in both normal secretory tissues and other carcinomas (i.e. , pancreas, liver, colon) 
2§ [Hardardottir H et al, Distribution of CA125 in embryonic tissue and adult derivatives of the 
^ fetal periderm, AmJObstet. Gynecol. 163;6(1): 1925-193 1 (1990); Zurawski VR et al., Tissue 
distribution and characteristics of the CA125 antigen, Cancer Rev. 11-12:102-108 (1988); and 
O'Brien TJ et al., CA125 antigen in human amniotic fluid and fetal membranes, Am J Obstet 
Gynecol. 155:50-55, (1986); Nap M et al, Immunohistochemical characterization of 22 
25 monoclonal antibodies against the CA125 antigen: 2nd report from the ISOBM TD- 1 workshop, 
Tumor Biology 17:325-332 (1996)]. Notwithstanding, CA125 correlates directly with the 
disease status of affected patients (i.e., progression, regression, and no change), and has become 
the "gold standard" for monitoring patients with ovarian carcinoma [Bast RC et al, A 
radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian 
30 cancer, N Engl J Med. 309:883-887 (1983); and Bon GC et al, Serum tumor marker 




immunoassays in gynecologic oncology: Establishment of reference values, Am J Obstet. 
Gynecol. 174:107-1 14 (1996)]. CA125 is especially useful in post-menopausal patients where 
endometrial tissue has become atrophic and, as a result, is not a major source of normal 
circulating CA125. 

5 During the mid 1980's, the inventor of the present invention and others developed Ml 1 , a 

monoclonal antibody to CA125. Ml 1 binds to a dominant epitope on the repeat structure of the 
CA125 molecule [O'Brien TJ et al, New monoclonal antibodies identify the glycoprotein 
carrying the CA125 epitope, Am J Obstet Gynecol 165:1857-64 (1991)]. More recently, the 
inventor and others developed a purification and stabilization scheme for CA125, which allows 
10 for the accumulation of highly purified high molecular weight CA125 [O'Brien TJ et al, More 
than 15 years of CA125: What is known about the antigen, its structure and its function, bit J 
Biological Markers 13(4): 188- 195 (1998)]. 
E ij Considerable progress has been made over the years to further characterize the CA 1 25 

>?B molecule, its structure and its function. The CA125 molecule is a high molecular weight 
lij glycoprotein with a predominance of O-linked sugar side chains. The native molecule exists as a 
i j very large complex (-2-5 million daltons). The complex appears to be composed of an epitope 
^ containing CA125 molecule and binding proteins which carry no CA125 epitopes. The CA125 
Q molecule is heterogenous in both size and charge, most likely due to continuous deglycosylation 
m of the side chains during its life-span in bodily fluids. The core CA 1 25 subunit is in excess of 
2$ 200,000 daltons, and retains the capacity to bind both OC125 and Ml 1 class antibodies. While 
the glycoprotein has been described biochemically and metabolically by the inventor of the 
present invention and others, no one has yet cloned the CA125 gene, which would provide the 
basis for understanding its structure and its physiologic role in both normal and malignant tissues. 
Despite the advances in detection and quantitation of serum tumor markers like CA125, 
25 the majority of ovarian cancer patients are still diagnosed at an advanced stage of the disease- 
Stage III or IV. Further, the management of patients' responses to treatment and the detection of 
disease recurrence remain major problems. There, thus, remains a need to significantly improve 
and standardize current CA125 assay systems. Further, the development of an early indicator of 
risk of ovarian cancer will provide a useful tool for early diagnosis and improved prognosis. 
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SUMMARY OF THE INVENTION 

The CA125 gene has been cloned and multiple repeat sequences as well as the carboxy 
terminus have been identified. CA125 requires a transcript of more than 35,000 bases and 
occupies approximately 150,000 bp on chromosome 19q 13.2. The CA125 molecule comprises 
three major domains: an extracellular amino terminal domain (Domain 1); a large multiple repeat 
domain (Domain 2); and a carboxy terminal domain (Domain 3) which includes a transmembrane 
anchor with a short cytoplasmic domain. The amino terminal domain is assembled by combining 
five genomic exons, four very short amino terminal sequences and one extraordinarily large 
exon. This domain is dominated by its capacity for O-glycosylation and its resultant richness in 
serine and threonine residues. 

The extracellular repeat domain, which characterizes the CA125 molecule, also represents 
a major portion of the CA125 molecular structure. It is downstream from the amino terminal 
domain and presents itself in a much different manner to its extracellular matrix neighbors. 
These repeats are characterized by many features including a highly-conserved nature and a 
uniformity in exon structure. But most consistently, a cysteine enclosed sequence may form a 
cysteine loop. Domain 2 comprises 156 amino acid repeat units of the CA125 molecule. The 
repeat domain constitutes the largest proportion of the CA125 molecule. The repeat units also 
include the epitopes now well-described and classified for both the major class of CA125 
antibodies of the OC125 group and the Ml 1 group. More than 60 repeat units have been 
identified, sequenced, and contiguously placed in the CA125 domain structure. The repeat 
sequences demonstrated 70-85% homology to each other. The existence of the repeat sequences 
was confirmed by expression of the recombinant protein in E. coli where both OC125/M1 1 class 
antibodies were found to bind to sites on the CA125 repeat. 

The CA125 molecule is anchored at its carboxy terminal through a transmembrane 
domain and a short cytoplasmic tail. The carboxy terminal also contains a proteolytic cleavage 
site approximately 50 amino acids upstream from the transmembrane domain, which allows for 
proteolytic cleavage and release of the CA125 molecule. 

The identification and sequencing of multiple repeat domains of the CA125 antigen 
provides potentially new clinical and therapeutic applications for detecting, monitoring and 
treating patients with ovarian cancer and other carcinomas where CA125 is expressed. For 




example, the ability to express repeat domains of CA125 with the appropriate epitopes would 
provide a much needed standard reagent for research and clinical applications. Current assays for 
CA125 utilize as standards either CA125 produced from cultured cell lines or from patient ascites 
fluid. Neither source is defined with regard to the quality or purity of the CA125 molecule. The 
5 present invention overcomes the disadvantages of current assays by providing multiple repeat 
domains of CA125 with epitope binding sites. At least one or more of any of the more than 60 
repeats shown in Table 16 can be used as a "gold standard" for testing the presence of CA125. 
Furthermore, new and more specific assays may be developed utilizing recombinant products for 
antibody production. 

1 0 Perhaps even more significantly, the multiple repeat domains of C A 1 25 or other domains 

could also be used for the development of a potential vaccine for patients with ovarian cancer. In 
order to induce cellular and humoral immunity in humans to CA125, murine antibodies specific 
for CA125 were utilized in anticipation of patient production of anti-ideotypic antibodies, thus 
indirectly allowing the induction of an immune response to the CA125 molecule. With the 
availability of recombinant CA125, especially domains which encompass epitope binding sites 
for known murine antibodies, it will be feasible to more directly stimulate patients' immune 
systems to CA125 and, as a result, extend the life of ovarian carcinoma patients. 

The recombinant CA125 of the present invention may also be used to develop therapeutic 
targets. Molecules like CA125, which are expressed on the surface of tumor cells, provide 
^ potential targets for immune stimulation, drug delivery, biological modifier delivery or any agent 
S which can be specifically delivered to ultimately kill the tumor cells. Humanized or human 
antibodies to CA125 epitopes could be used to deliver all drug or toxic agents including 
radioactive agents to mediate direct killing of tumor cells. Natural ligands having a natural 
binding affinity for domains on the CA125 molecule could also be utilized to deliver therapeutic 

25 agents to tumor cells. 

CA125 expression may further provide a survival or metastatic advantage to ovarian 
tumor cells. Antisense oligonucleotides derived from the CA125 repeat sequences could be used 
to down-regulate the expression of CA125. Further, antisense therapy could be used in 
association with a tumor cell delivery system of the type described above. 
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Recombinant domains of the CA125 molecule also have the potential to identify small 
molecules, which bind to individual domains of the CA125 molecule. These small molecules 
could also be used as delivery agents or as biological modifiers. 

In one aspect of the present invention, a CA125 molecule is disclosed comprising: (a) an 
5 extracellular amino terminal domain, comprising 5 genomic exons, wherein exon 1 comprises amino 
acids #1-33 of SEQ ID NO: 299, exon 2 comprises amino acids #34-1593 of SEQ ID NO: 299, exon 
3 comprises amino acids #1594-1605 of SEQ ID NO: 299, exon 4 comprises amino acids #1606- 
1617 of SEQ ID NO: 299, and exon 5 comprises amino acids #1618-1637 of SEQ ID NO: 299; (b) a 
multiple repeat domain, wherein each repeat unit comprises 5 genomic exons, wherein exon 1 
10 comprises amino acids #1-42 in any of SEQ ID NOS: 164 through 194; exon 2 comprises amino 

acids #43-65 in any of SEQ ID NOS: 195 through 221; exon 3 comprises amino acids #66-123 in any 
of SEQ ID NOS: 222 through 249; exon 4 comprises amino acids #124-135 in any of SEQ ID NOS: 
P 250 through 277; and exon 5 comprises amino acids #136-156 in any of SEQ ID NOS: 278 through 
^ 298; and (c) a carboxy terminal domain comprising a transmembrane anchor with a short cytoplasmic 
l|] domain, and further comprising 9 genomic exons, wherein exon 1 comprises amino acids #1-1 1 of 
Sj SEQ ID NO: 300; exon 2 comprises amino acids #12-33 of SEQ ID NO: 300; exon 3 comprises 
li amino acids #34-82 of SEQ ID NO: 300; exon 4 comprises amino acids #83-133 of SEQ ID NO: 

: : s 

Si 1ST 

* 300; exon 5 comprises amino acids #134-156 of SEQ ID NO: 300; exon 6 comprises amino acids 

P 3 ! 

3 #157-212 of SEQ ID NO: 300; exon 7 comprises amino acids #213-225 of SEQ ID NO: 300; exon 8 
2ti comprises amino acids #226-253 of SEQ ID NO: 300; and exon 9 comprises amino acids #254-284 
P of SEQ ID NO: 300. 

r ~ In another aspect of the present invention, the N-glycosylation sites of the amino terminal 

domain marked (x) in Figure 8B are encoded at positions #81, #271, #320, #624, #795, #834, #938, 
and #1,165 in SEQ ID NO: 299. 

25 In another aspect of the present invention, the serine and threonine O-glycosylation pattern for 

the amino terminal domain is marked (o) in SEQ ID NO: 299 in Figure 8B. 

In another aspect of the present invention, exon 2 in the repeat domain comprises at least 31 
different copies; exon 2 comprises at least 27 different copies; exon 3 comprises at least 28 different 
copies; exon 4 comprises at least 28 different copies, and exon 5 comprises at least 21 different 

30 copies. 
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In another aspect of the present invention, the repeat domain comprises 156 amino acid repeat 
units which comprise epitope binding sites. The epitope binding sites are located in the C-enclosure 
at amino acids #59-79 (marked C-C) in SEQ ID NO: 150 in Figure 5. 

In another aspect, the 156 amino acid repeat unit comprises O-glycosylation sites at positions 
5 #128, #129, #132, #133, #134, #135, #139, #145, #146, #148, #150, #151, and #156 in SEQ ID NO: 
1 50 in Figure 5C. The 156 amino acid repeat unit further comprises N-glycosylation sites at 
positions #33 and #49 in SEQ ID NO: 1 50 in Figure 5C. The repeat unit also includes at least one 
conserved methionine (designated M) at position #24 in SEQ ID NO: 150 in Figure 5C. 

In yet another aspect, the transmembrane domain of the carboxy terminal domain is located at 
10 positions #230-252 (underlined) in SEQ ID NO: 300 of Figure 9B. The cytoplasmic domain of the 
carboxy terminal domain comprises a highly basic sequence adjacent to the transmembrane at 
positions #256-260 in SEQ ID NO: 300 of Figure 9B, serine and threonine phosporylation sites at 
| positions #254, #255, and #276 in SEQ ID NO: 300 in Figure 9B, and tyrosine phosphorylation sites 
5 at positions #264, #273, and #274 in SEQ ID NO: 300 of Figure 9B. 

In another aspect of the present invention, an isolated nucleic acid of the CA125 gene is 
disclosed, which comprises a nucleotide sequence selected from the group consisting of: (a) the 
nucleotide sequences set forth in SEQ ID NOS: 49, 67, 81, 83-145, 147, 150, and 152; (b) a 
nucleotide sequence having at least 70% sequence identity to any one of the sequences in (a); (c) a 
degenerate variant of any one of (a) to (b); and (d) a fragment of any one of (a) to (c). 
2$ In another aspect of the present invention, an isolated nucleic acid of the CA 1 25 gene, 

U comprising a sequence that encodes a polypeptide with the amino acid sequence selected from the 
group consisting of: (a) the amino acid sequences set forth in SEQ ID NOS: 1 1-47, 50-80, 82, 146, 
148, 149, 151, and 153-158; (b) an amino acid sequence having at least 50% sequence identity to any 
one of the sequences in (a); (c) a conservative variant of any one of (a) to (b); and (d) a fragment of 

25 any one of (a) to (c). 

In yet another aspect, a vector comprising the nucleic acid of the CA125 gene is disclosed. 
The vector may be a cloning vector, a shuttle vector, or an expression vector. A cultured cell 
comprising the vector is also disclosed. 

In yet another aspect, a method of expressing CA125 antigen in a cell is disclosed, comprising 
30 the steps of: (a) providing at least one nucleic acid comprising a nucleotide sequence selected from 
the group consisting of: (i) the nucleotide sequences set forth in SEQ ID NOS: 49, 67, 81, 83-145, 
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,47 ISO and 152; (u) a nucleotide sequence having a, .east 70% sequence identity to any one of the 
sequence in (i); (Hi) a degenerate variant of any one of (,) to (ii); and (iv) a fragment of any one of 
(i) to (ii,); <b) providing ceils comprising an mRNA encoding the CA.25 antigen; and (c) .ntroduc.ng 
the nucleic acid into the cells, wherein the CA125 antigen is expressed m the cells. 
5 In ye. another aspect, a purified polypeptide of the CA125 gene, comprising an ammo acd 

sequence selected from the group consisting of: (a) the amino acid sequences se, forth tn SEQ ID 
NOS- 1 1-48, 50, 68-80, 82, 146, 148, 149, 150, 151, and 153-158; (h) an am.no acid sequence havmg 
at least SOS sequence identity to any one of the sequences in (a); (c) a conservative vanant of any 
one of (a) to (b); and (d) a fragment of any one of (a) to (c). 
,0 in another aspect, a purified antibody that selectively binds to an epitope in the receptor- 

binding domain of CA125 protein, wherein the epitope is within the amino actd sequence selected 
from the group consisting of: (a) the amino acid sequences se, forth in SEQ ID NOS: „ -48, 50, 68- 
80 ,46, 151, and 153-158; (b) an amino acid sequence havmg a, least 50% sequence identity ,0 any 
one of the sequences in (a); (c) a conservative variant of any one of (a) to (h); and (d) a fragment of 

if any one of (a) to (c). 

A diagnostic for detecting and monitoring the presence of CA125 antigen ,s also d.sclosed, 
whichcomprises recombmant CA125 comprising a. least one repeat unit of the CA125 repeat domam 
includmg epitope binding sites selected from the group consisting of amino acid sequences set orth 
in SEQ ID NOS: 11-48, 50, 68-80, 82, 146, 150, 151, 153-161, and 162 (amino acids #1,643-11,438). 

A therapeutic vaccine to treat mammals with elevated CA125 antigen levels or at nsk of 
developing a disease or disease recurrence associated with elevated CA125 antigen levels ts also 
» disclosed The vaccine comprises recombinant CA125 repeat domains including epttope bmdmg 
sites, wherein the repeat domains are selected from the group of amino acid sequences conststmg o f 
SEQ ID NOS- 1 1-48, 50, 68-80, 82, 146, 148, 149, 150, 151, 153-161, and ,62 (amino actds #1,643- 
25 1 1 438) and amino acids #175-284 of SEQ ID NO: 300. Mammals include animals and humans. 

in another aspect of the present invention, an an.isense oligonucleotide is disclosed that 
mhibits the expression of CA 125 encloded by: (a) the nucleotide sequences set forth m SEQ ID 
NOS- 49 67 81, 83-145, 147, 150, and ,52; (b) a nucleotide sequence having a, least 70% sequence 
identity to any one of the sequences in (a); (c) a degenerate variant of any one of (a) to (b); and (d) a 
30 fragment of any one of (a) to (c). 
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The preceeding and further aspects of the present invention will be apparent to those of 
ordinary skill in the art from the following description of the presently preferred embodiments of 
the invention, such description being merely illustrative of the present invention. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates the cyanogen bromide digested products of CA125 on Western blot 
probed with Ml 1 and OC125 antibodies. Table 1 shows the amino acid sequence derived from the 
amino terminal end of the 40 kDa cyanogen bromide peptide along with internal sequences obtained 
after protease digestion of the 40 kDa fragment (SEQ ID NOS: 1-4). SEQ ID NO: 1 is the amino 

10 terminal sequence derived of the 40 kDa peptide and SEQ ID NOS: 2, 3, and 4 reflect internal amino 
acid sequences derived from peptides after protease digestion of the 40 kDa fragment. Table 1 
further provides a translation of the EST (BE005912) with homologous sequences (SEQ ID NOS: 5 

□ and 6) either boxed or underlined. Protease cleavage sites are indicated by arrows. 

J Figure 2A illustrates PCR amplification of products generated from primers utilizing the EST 

fS sequence referred to in Figure 1 , the amino acid sequence obtained from the 40 kDa fragment and 

i,s E 

N EST sequence AA# 640762. Lane 1-2: normal; 3: serous ovarian carcinoma; 4: serous ovarian 

l fl carcinoma; 5: mucinous ovarian carcinoma; 6: P-tubulin control. The anticipated size band 400 b is 

* present in lane 3 and less abundantly in lane 4. 

3 Figure 2B illustrates the RT-PCR that was performed to determine the presence or absence of 

2$ CA125 transcripts in primary culture cells of ovarian tumors. This expression was compared to 
P tubulin expression as an internal control. Lanes 1, 3, 5, 7, and 9 represent the primary ovarian tumor 
~ cell lines. Lanes 2, 4, 6, and 8 represent peripheral blood mononuclear cell lines derived from the 
corresponding patients in lanes 1, 3, 5, and 7. Lane 10 represents fibroblasts from the patient tumor 
in lane 9. Lanes 11 and 12 are CaOV3 and a primary tumor specimen, respectively. 
25 Figure 3 illustrates repeat sequences determined by sequencing cloned cDNA from the 400 b 

band in Figure 2B. Placing of repeat sequences in a contiguous fashion was accomplished by PCR 
amplification and sequencing of overlap areas between two repeat sequences. A sample of the 
complete repeat sequences is shown in SEQ ID NOS: 158, 159, 160, and 161, which was obtained in 
this manner and placed next to each other based on overlap sequences. The complete list of repeat 
30 sequences that was obtained is shown in Table 21 (SEQ ID NO: 162). 
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Figure 4 illustrates three Western immunoblot patterns: Panel A = probed with Mil, Panel B 
= probed with OC125 and Panel C = probed with antibody ISOBM 9.2. Each panel represents E. coli 
extracts as follows: lane 1 = E. coli extract from bacteria with the plasmid PQE-30 only. Lane 2 = E. 
coli extract from bacteria with the plasmid PQE-30 which includes the CA125 repeat unit. Lane 3 = 
E. coli extract from bacteria with the plasmid PQE-30 which includes the TADG-14 protease 
unrelated to CA125. Panel D shows a Coomassie blue stain of a PAGE gel of E. coli extract derived 
from either PQE-30 alone or from bacteria infected with PQE-30 - CA125 repeat (recombinant 
CA125 repeat). 

Figure 5 represents Western blots of the CA125 repeat sequence that were generated to 
determine the position of the Ml 1 epitope within the recombinant CA125 repeat. The expressed 
protein was bound to Ni-NTA agarose beads. The protein was left undigested or digested with 
Asp-N or Lys-C. The protein remaining bound to the beads was loaded into lanes 1, 2, or 3 
corresponding to undigested, Asp-N digested and Lys-C digested, respectively. The supernatants 
from the digestions were loaded in lanes 4, 5, and 6 corresponding to undigested, Asp-N digested and 
Lys-C digested, respectively. The blots were probed with either anti-His tag antibody (A) or Ml 1 
antibody (B). Panel C shows a typical repeat sequence corresponding to SEQ ID NO: 150 with 
each exon defined by arrows. All proteolytic aspartic acid and lysine sites are marked with 
overhead arrow or dashes. In the lower panel, the O-glycosylation sites in exons 4 and 5 are 
marked with 0, the N-glycosylation sites are marked with X plus the amino acid number in the 
repeat (#12, 33, and 49) the conserved methionine is designated with M plus the amino acid 
number (M#24), and the cysteine enclosure which is also present in all repeats and encompasses 
19 amino acids between the cysteines is marked with C-C (amino acids #59-79). The epitopes 
for Ml 1 and OC125 are located in the latter part of the C-enclosure or downstream from the C- 
enclosure. 

Figure 6 illustrates a Northern blot analysis of RNA derived from either normal ovary (N) or 
ovarian carcinoma (T) probed with a P 32 cDNA repeat sequence of CA125. Total RNA samples 
(lOug) were size separated by electrophoresis on a formaldehyde 1.2% agarose gel. After blotting to 
Hybond N, the lanes were probed with P 32 radiolabeled 400 bp repeat (see Figure 2). Lane 1 
represents RNA from normal ovarian tissue, and lane 2 represents RNA from serous ovarian tumor 
tissue. 
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Figure 7 A is a schematic diagram of a typical repeat unit for CA125 showing the N- 
glycosylation sites at the amino end and the totally conserved methionine (M). Also shown is the 
proposed cysteine enclosed loop with antibody binding sites for OC125 and Ml 1 . Also noted are the 
highly O-glycosylated residues at the carboxy end of the repeat. 

Figure 7B represents the genomic structure and exon configuration of a 156 amino acid repeat 
sequence of CA125 (SEQ ID NO: 163), which comprises a standard repeat unit. 

Figure 7C lists the individual known sequences for each exon, which have been determined as 
follows: Exon 1 - SEQ ID NOS: 164-194; Exon 2 - SEQ ID NOS: 195-221; Exon 3 - SEQ ID NOS: 
222-249; Exon 4 - SEQ ID NOS: 250-277; and Exon 5 - SEQ ID NOS: 278-298. 

Figure 8 A shows the genomic structure of the amino terminal end of the CA125 gene. It also 
indicates the amino composition of each exon in the extracellular domain. 

Figure 8B illustrates the amino acid composition of the amino terminal domain (SEQ ID NO: 
299) with each potential O-glycosylation site marked with a superscript (o) and N-glycosylation sites 
marked with a superscript (x). T-TALK sequences are underlined. 
1 Jj Figure 9 A illustrates the genomic exon structure of the carboxy-terminal domain of the 

CA125 gene. It includes a diagram showing the extracellular portion, the potential cleavage site, the 
transmembrane domain and the cytoplasmic tail. 

Figure 9B illustrates the amino acid composition of the carboxy terminal domain (SEQ ID 
NO: 300) including the exon boundaries, O-glycosylation sites (o), and N-glycosylation sites (x). 
M The proposed transmembrane domain is underlined. 

u Figure 10 illustrates the proposed structure of the CA125 molecule based on the open reading 

frame sequence described herein. As shown, the molecule is dominated by a major repeat domain in 
the extracellular space along with a highly glycosylated amino terminal repeat. The molecule is 
anchored by a transmembrane domain and also includes a cytoplasmic tail with potential for 
25 phosphorylation. 

DETAILED DESCRIPTION OF THE INVENTION 

In accordance with the present invention, conventional molecular biology, microbiology, 
and recombinant DNA techniques may be used that will be apparent to those skilled in the 
30 relevant art. Such techniques are explained fully in the literature (see, e.g., Maniatis, Fritsch & 
Sambrook, "Molecular Cloning: A Laboratory Manual (1982); "DNA Cloning: A Practical 
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Approach," Volumes I and II (D. N. Glover ed. 1985); "Oligonucleotide Synthesis" (M. J. Gait 
ed. 1984); "Nucleic Acid Hybridization" (B. D. Hames & S. J. Higgins eds. (1985)); 
"Transcription and Translation" (B. D. Hames & S. J. Higgins eds. (1984)); "Animal Cell 
Culture" (R. I. Freshney, ed. (1986)); "Immobilized Cells And Enzymes" (IRL Press, (1986)); 
5 and B. Perbal, "A Practical Guide To Molecular Cloning" (1984)). 

Therefore, if appearing herein, the following terms shall have the definitions set out 

below. 

A "vector" is a replicon, such as plasmid, phage or cosmid, to which another DNA 
segment may be attached so as to bring about the replication of the attached segment. 
1 0 A "DNA molecule" refers to the polymeric form of deoxyribonucleotides (adenine, 

guanine, thymine, or cytosine) in either single stranded form, or a double-stranded helix. This 
term refers only to the primary and secondary structure of the molecule, and does not limit it to 

leas' , 1 , . 

any particular tertiary forms. Thus, this term includes double-stranded DNA found, inter aha, in 
tfi linear DNA molecules (e.g., restriction fragments), viruses, plasmids, and chromosomes. 
lS As used herein, the term "gene" shall mean a region of DNA encoding a polypeptide 

^ chain. 

7 "Messenger RNA" or "mRNA" shall mean an RNA molecule that encodes for one or 

*S more polypeptides. 

; U "DNA polymerase" shall mean an enzyme which catalyzes the polymerization of 

20 deoxyribonucleotide triphosphates to make DNA chains using a DNA template. 
r " "Reverse transcriptase" shall mean an enzyme which catalyzes the polymerization of 

deoxy- or ribonucleotide triphosphates to make DNA or RNA chains using an RNA or DNA 

template. 

"Complementary DNA" or "cDNA" shall mean the DNA molecule synthesized by 
25 polymerization of deoxyribonucleotides by an enzyme with reverse transcriptase activity. 

An "isolated nucleic acid" is a nucleic acid the structure of which is not identical to that of 
any naturally occurring nucleic acid or to that of any fragment of a naturally occurring genomic 
nucleic acid spanning more than three separate genes. The term therefore covers, for example, 
(a) a DNA which has the sequence of part of a naturally occurring genomic DNA molecule but is 
30 not flanked by both of the coding sequences that flank that part of the molecule in the genome of 

n 





the organism in which it naturally occurs; (b) a nucleic acid incorporated into a vector or into the 
genomic DNA of a prokaryote or eukaryote in a manner such that the resulting molecule is not 
identical to any naturally occurring vector or genomic DNA; (c) a separate molecule such as a 
cDNA, a genomic fragment, a fragment produced by polymerase chain reaction (PCR), or a 
5 restriction fragment; and (d) a recombinant nucleotide sequence that is part of a hybrid gene, i.e., 
a gene encoding a fusion protein. 

"Oligonucleotide", as used herein in referring to the probes or primers of the present 
invention, is defined as a molecule comprised of two or more deoxy- or ribonucleotides, 
preferably more than ten. Its exact size will depend upon many factors which, in turn, depend 
1 0 upon the ultimate function and use of the oligonucleotide. 

"DNA fragment" includes polynucleotides and/or oligonucleotides and refers to a plurality 
of joined nucleotide units formed from naturally-occurring bases and cyclofuranosyl groups 
joined by native phosphodiester bonds. This term effectively refers to naturally-occurring 
| species or synthetic species formed from naturally- occurring subunits. "DNA fragment" also 
lil refers to purine and pyrimidine groups and moieties which function similarly but which have non 
naturally-occurring portions. Thus, DNA fragments may have altered sugar moieties or inter- 
sugar linkages. Exemplary among these are the phosphorothioate and other sulfur containing 
species. They may also contain altered base units or other modifications, provided that biological 
activity is retained. DNA fragments may also include species which include at least some 
2l modified base forms. Thus, purines and pyrimidines other than those normally found in nature 
M may be so employed. Similarly, modifications on the cyclofuranose portions of the nucleotide 
subunits may also occur as long as biological function is not eliminated by such modifications. 

"Primer" shall refer to an oligonucleotide, whether occurring naturally or produced 
synthetically, which is capable of acting as a point of initiation of synthesis when placed under 
25 conditions in which synthesis of a primer extension product, which is complementary to a nucleic 
acid strand, is induced, i.e., in the presence of nucleotides and an inducing agent such as a DNA 
polymerase and at a suitable temperature and pH. The primer may be either single-stranded or 
double-stranded and must be sufficiently long to prime the synthesis of the desired extension 
product in the presence of the inducing agent. The exact length of the primer will depend upon 
30 many factors, including temperature, the source of primer and the method used. For example, for 
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diagnostic applications, depending on the complexity of the target sequence, the oligonucleotide 
primer typically contains 10-25 or more nucleotides, although it may contain fewer nucleotides. 

The primers herein are selected to be "substantially" complementary to different strands of 
a particular target DNA sequence. This means that the primers must be sufficiently 
5 complementary to hybridize with their respective strands. Therefore, the primer sequence need 
not reflect the exact sequence of the template. For example, a non-complementary nucleotide 
fragment may be attached to the 5' end of the primer, with the remainder of the primer sequence 
being complementary to the strand. Alternatively, non-complementary bases or longer sequences 
can be interspersed into the primer, provided that the primer sequence has sufficient 

1 0 complementarity with the sequence or hybridize therewith and thereby form the template for the 

synthesis of the extension product. 

As used herein, the term "hybridization" refers generally to a technique wherein denatured 
S RNA or DNA is combined with complementary nucleic acid sequence which is either free in 
| solution or bound to a solid phase. As recognized by one skilled in the art, complete 

11 complementarity between the two nucleic acid sequences is not a pre-requisite for hybridization 
i i to occur. The technique is ubiquitous in molecular genetics and its use centers around the 

m identification of particular DNA or RNA sequences within complex mixtures of nucleic acids. 
□ As used herein, "restriction endonucleases" and "restriction enzymes" shall refer to 

fu bacterial enzymes which cut double-stranded DNA at or near a specific nucleotide sequence. 
$ "Purified polypeptide-refers to any peptide generated from CA125 either by proteolytic 

ft ST 

' fA cleavage or chemical cleavage. 

"Degenerate variant" refers to any amino acid variation in the repeat sequence, which 
fulfills the homology exon structure and conserved sequences and is recognized by the Ml 1 , 
OC125 and ISOBM series of antibodies. 
25 "Fragment" refers to any part of the CA125 molecule identified in a purification scheme. 

"Conservative variant antibody" shall mean any antibody that fulfills the criteria of Ml 1, 
OC125 or any of the ISOBM antibody series. 
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MATERIALS AND METHODS 

A. Tissue collection, RNA Isolation and cDNA Synthesis 

Both normal and ovarian tumor tissues were utilized for cDNA preparation. Tissues were 
routinely collected and stored at -80°C according to a tissue collection protocol. 

Total RNA isolation was performed according to the manufacturer's instructions using the 
TriZol Reagent purchased from GibcoBRL (Catalog #1 5596-01 8). In some instances, mRNA 
was isolated using oligo dT affinity chromatography. The amount of RNA recovered was 
quantitated by UV spectrophotometry. First strand complementary DNA (cDNA) was 
synthesized using 5.0 ug of RNA and random hexamer primers according to the manufacturer's 
protocol utilizing a first strand synthesis kit obtained from Clontech (Catalog #K1402-1). The 
purity of the cDNA was evaluated by PCR using primers specific for the G-tubulin gene. These 
primers span an intron such that the PCR products generated from pure cDNA can be 
distinguished from cDNA contaminated with genomic DNA. 

B. Identification and Ordering of CA125 Repeat Units 

It has been demonstrated that the 2-5 million dalton CA125 glycoprotein (with repeat 
domains) can be chemically segmented into glycopeptide fragments using cyanogen bromide. As 
shown in Figure 1, several of these fragments, in particular the 40 kDa and 60 kDa fragments, 
still bind to the to the two classical antibody groups defined by OC 125 and Ml 1. 

To convert CA125 into a consistent glycopeptide, the CA125 parent molecule was 
processed by cyanogen bromide digestion. This cleavage process resulted in two main fractions 
on commassie blue staining following polyacrylamide gel electrophoresis. An approximately 60 
kDa band and a more dominant 40 kDa band were identified as shown in Figure 1 . When a 
Western blot of these bands was probed with either OC125 or Ml 1 antibodies (both of which 
define the CA125 molecule), these bands bound both antibodies. The 40 kDa band was 
significantly more prominent than the 60 kDa band. These data thus established the likelihood of 
these bands (most especially the 40 kDa band) as being an authentic cleavage peptide of the 
CA125 molecule, which retained the identifying characteristic of OC125 and Ml 1 binding. 

The 40 kDa and 60 kDa bands were excised from PVDF blots and submitted to amino 
terminal and internal peptide amino acid sequencing as described and practiced by Harvard 
Sequencing , (Harvard Microchemistry Facility and The Biological Laboratories, 16 Divinity 
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Avenue, Cambridge, Massachusetts 02138). Sequencing was successful only for the 40 kDa 
band where both amino terminal sequences and some internal sequences were obtained as shown 
in Table 1 at SEQ ID NOS: 1-4. The 40 kDa fragment of the CA125 protein was found to have 
homology to two translated EST sequences (GenBank Accession Nos. BE005912 and 
5 AA640762). Visual examination of these translated sequences revealed similar amino acid 
regions, indicating a possible repetitive domain. The nucleotide and amino acid sequences for 
EST Genbank Accession No. BE005912 (corresponding to SEQ ID NO: 5 and SEQ ID NO: 6, 
respectively) are illustrated in Table 1 . Common sequences are boxed or underlined. 

In an attempt to identify other individual members of this proposed repeat family, two 
1 0 oligonucleotide primers were synthesized based upon regions of homology in these EST 
sequences. Shown in Table 2 A, the primer sequences correspond to SEQ ID NOS: 7 and 8 
(sense primers) and SEQ ID NOS: 9 and 10 (antisense primers). Repeat sequences were 
J amplified in accordance with the methods disclosed in the following references: Shigemasa K et 
~S al., p21 : A monitor of p53 dysfunction in ovarian neoplasia, Int. J. Gynecol. Cancer 7:296-303 
1$ (1997) and Shigemasa K et al, pl6 Overexpression: A potential early indicator of transformation 

W in ovarian carcinoma, J. Soc. Gynecol. Invest. 4:95-102 (1997). Ovarian tumor cDNA obtained 

fn 

7* from a tumor cDNA bank was used. 

? Amplification was accomplished in a Thermal Cycler (Perkin-Elmer Cetus). The reaction 

TU mixture consisted of 1U Taq DNA Polymerase in storage buffer A (Promega), IX Thermophilic 
2J DNA Polymerase 10X Mg free buffer (Promega), 300mM dNTPs, 2.5mM MgC12, and 0.25mM 
^ each of the sense and antisense primers for the target gene. A 20 ul reaction included 1 ul of 
cDNA synthesized from 50ng of mRNA from serous tumor mRNA as the template. PCR 
reactions required an initial denaturation step at 94°C/1.5 min. followed by 35 cycles of 94°C/0.5 
min., 48°C/0.5 min., 72°C/0.5 min. with a final extension at 72°C/7 min. Three bands were 
25 initially identified (»400 bp, »800 bp, and » 1 200 bp) and isolated. After size analysis by agarose 
gel electrophoresis, these bands as well as any other products of interest were then ligated into a 
T-vector plasmid (Promega) and transformed into competent DH5a strain of E. coli cells. After 
growth on selective media, individual colonies were cultured overnight at 37°C, and plasmid 
DNA was extracted using the QIAprep Spin Miniprep kit (Qiagen). Positive clones were 
30 identified by restriction digests using Apa I and Sac I. Inserts were sequenced using an ABI 
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automatic sequencer, Model 377, T7 primers, and a Big Dye Terminator Cycle Sequencing Kit 
(Applied Biosystems). 

Obtained sequences were analyzed using the Pileup program of the Wisconsin Genetic's 
Computer Group (GCG). Repeat units were ordered using primers designed against two highly 
5 conserved regions within the nucleotide sequence of these identified repeat units. Shown in 
Table 2B, the sense and antisense primers (5'-GTCTCTATGTCAATGGTTTCACCC-3' / 5'- 
TAGCTGCTCTCTGTCCAGTCC-3 ' SEQ ID NOS: 301 and 302, respectively) faced away from 
one another within any one repeat creating an overlap sequence, thus enabling amplification 
across the junction of any two repeat units. PCR reactions, cloning, sequencing, and analysis 
10 were performed as described above. 

C. Identification and Assembly of the CA125 Amino Terminal Domain 

In search of open reading frames containing sequences in addition to CA125 repeat units, 

lesJ 

h3 database searches were performed using the BLAST program available at the National Center for 
m Biotechnology Information (www.ncbi.nlm.nih.gov/). Using a repeat unit as the query sequence, 
Yp. cosmid AC008734 was identified as having multiple repeat sequences throughout the unordered (35) 
iU contiguous pieces of DNA, also known as contigs. One of these contigs, #32, was found to have 
^ exons 1 and 2 of a repeat region at its 3' end. Contig#32 was also found to contain a large open 
E J reading frame (ORF) upstream of the repeat sequence. PCR was again used to verify the existence of 
fll this ORF and confirm its connection to the repeat sequence. The specific primers recognized the 3' 
2:| end of this ORF (5 '-CAGCAGAGACCAGCACGAGTACTC-3 ')(SEQ ID NO: 51) and sequence 
!?si within the repeat (5'-TCCACTGCCATGGCTGAGCT-3')(SEQ ID NO: 52). The remainder of the 
amino-terminal domain was assembled from this contig in a similar manner. With each PCR 
confirmation, a new primer (see Table 10A) was designed against the assembled sequence and used 
in combination with a primer designed against another upstream potential ORF (Set 1:5'- 
25 CCAGCACAGCTCTTCCCAGGAC-3' / 5'-GGAATGGCTGAGCTGACGTCTG-3'(SEQ ID NO: 
53 and SEQ ID NO: 54); Set 2: 5'-CTTCCCAGGACAACCTCAAGG-3' / 5'- 
GCAGGATGAGTGAGCCACGTG-3'(SEQ ID NO: 55 and SEQ ID NO: 56); Set 3: 5'- 
GTCAGATCTGGTGACCTC ACTG-3 ' / 5 '-GAGGCACTGGAAAGCCCAGAG-3 ')(SEQ ID NO: 
57 and SEQ ID NO: 58). Potential adjoining sequence (contig #7 containing EST AU133673) was 
30 also identified using contig #32 sequence as query sequence in database searches. Confirmation 
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primers were designed and used in a typical manner (5'-CTGATGGCATTATGGAACACATCAC-3' 
/ 5 ' -CCCAGAACGAGAGACCAGTGAG-3 ')(SEQ ID NO: 59 and SEQ ID NO: 60). 

In order to identify the 5' end of the CA125 sequence, 5' Rapid Amplification of cDNA Ends 
(FirstChoice™ RLM-RACE Kit, Ambion) was performed using tumor cDNA . The primary PCR 
5 reaction used a sense primer supplied by Ambion (5'-GCTGATGGCGATGAATGAACACTG-3 ') 
(SEQ ID NO: 61) and an anti-sense primer specific to confirmed contig #32 sequence (5'- 
CCCAGAACGAGAGACC AGTGAG-3 ')(SEQ ID NO: 62). The secondary PCR was then 
performed using nested primers, sense from Ambion (5'- 

CGCGGATCCGAACACTGCGTTTGCTGGCTTTGATG-3') (SEQ ID NO: 63) and the anti-sense 
10 was specific to confirmed contig #7 sequence (5'-CCTCTGTGTGCTGCTTCATTGGG-3')(SEQ ID 
NO: 64). The RACE PCR product (a band of approximately 300 bp) was cloned and sequenced as 
previously described. 

3 D. Identification and Assembly of the CA125 Carboxy Terminal Domain 

i : 3 Database searches using confirmed repeat units as query also identified a cDNA sequence 

1$ (GenBank AK024365) containing other repeat units, but also a potential carboxy terminal sequence. 
^ The contiguous nature of this sequence with assembled CA125 was confirmed using PCR (5'- 
S GGACAAGGTCACCACACTCTAC-3 ' / 5'-GCAGATCCTCCAGGTCTAGGTGTG-3'), (SEQ ID 
U NO: 303 and SEQ ID NO: 304, respectively) as well as contig and EST analysis. 
v3 E. Expression of 6xHis-tagged CA125 repeat in E. coli 

2<3 The open reading frame of a CA125 repeat shown in Table 1 1 was amplified by PCR with the 

f I sense primer (5 '-ACCGGATCCATGGGCCACACAGAGCCTGGCCC-3 ') (SEQ ID NO: 65) the 
antisense primer (5'-TGTAAGCTTAGGCAGGGAGGATGGAGTCC-3') (SEQ ID NO: 66) PCR 
was performed in a reaction mixture consisting of ovarian tumor cDNA derived from 50 ng of 
mRNA, 5 pmol each of sense and antisense primers for the CA125 repeat, 0.2 mmol of dNTPs, and 

25 0.625 U of Taq polymerase in lx buffer in a final volume of 25 ml. This mixture was subjected to 1 
minute of denaturation at 95°C followed by 30 cycles of PCR consisting of the following: 
denaturation for 30 seconds at 95°C, 30 seconds of annealing at 62°C, and 1 minute of extension at 
72°C with an additional 7 minutes of extension on the last cycle. The product was electrophoresed 
through a 2% agarose gel for separation. The PCR product was purified and digested with the 

30 restriction enzymes Bam HI and Hind III. This digested PCR product was then ligated into the 
expression vector pQE-30, which had also been digested with Bam HI and Hind III. This clone 
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would allow for expression of recombinant 6xHis-tagged CA125 repeat. Transformed E. coli 
(JM109) were grown to an OD600 of 1.5-2.0 at 37°C and then induced with IPTG (0.1 mM) for 4-6 
hours at 25°C to produce recombinant protein. Whole E. coli lysate was electrophoresed through a 
12% SDS polyacrylamide gel and Coomassie stained to detect highly expressed proteins. 
5 F. Western Blot Analysis 

Proteins were separated on a 12% SDS-PAGE gel and electroblotted at 100V for 40 
minutes at 4°C to nitrocellulose membrane. Blots were blocked overnight in phosphate-buffered 
saline (PBS) pH 7.3 containing 5% non-fat milk. CA125 antibodies Mil, OC125, or ISOBM 9.2 
were incubated with the membrane at a dilution of 5ug/ml in 5% milk/PBS-T (PBS plus 0.1% 
10 TX- 1 00) and incubated for 2 hours at room temperature. The blot was washed for 30 minutes 
with several changes of PBS and incubated with a 1:10,000 dilution of horseradish peroxidase 
(HRP) conjugated goat anti-mouse IgG antibody (Bio-Rad) for 1 hour at room temperature. 
3 Blots were washed for 30 minutes with several changes of PBS and incubated with a 
I chemiluminescent substrate (ECL from Amersham Pharmacia Biotech) before a 1 0-second 
lif! exposure to X-ray film for visualization. 

Figure 4 illustrates three Western immunoblot patterns of the recombinant CA125 repeat 
purified from E. coli lysate (lane 2) compared to E. coli lysate with no recombinant protein (lane 
1-negative control) and a recombinant protein TADG-14 which is unrelated to CA125 (lane 3). 
As shown, the Ml 1 antibody, the OC125 antibody and the antibody ISOBM 9.2 (an OC125-like 
£ antibody) all recognized the CA125 recombinant repeat (lane 2), but did not recognize either the 
h * E. coli lysate (lane 1) or the unrelated TADG-14 recombinant (lane 3). These data confirm that 
the recombinant repeat encodes both independent epitopes for CA125, the OC125 epitope and the 
Ml 1 epitope. 

G. Northern Blot Analysis 

25 Total RNA samples (approximately 1 Oug) were separated by electrophoresis through a 

6.3% formaldehyde, 1.2% agarose gel in 0.02 M MOPS, 0.05 M sodium acetate (pH 7.0), and 
0.001 M EDTA. The RNAs were then blotted to Hybond-N (Amersham) by capillary action in 
20x SSPE and fixed to the membrane by baking for 2 hours at 80°C. A PCR product 
representing one 400 bp repeat of the CA125 molecule was radiolabeled using the Prime-a-Gene 

30 Labeling System available from Promega (cat. #U1 100). The blot was probed and stripped 
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according to the ExpressHyb Hybridization Solution protocol available from Clontech (Catalog 
#8015-1). 

RESULTS 

In 1997, a system was described by a co-inventor of the present invention and others for 
purification of CA125 (primarily from patient ascites fluid), which when followed by cyanogen 
bromide digestion, resulted in peptide fragments of CA125 of 60 kDa and 40 kDa [O'Brien TJ et ai, 
More than 15 years of CA125: What is known about the antigen, its structure and its function, Int J 
Biological Markers 13(4)188-195 (1998)]. Both fragments were identifiable by commassie blue 
staining on polyacrylamide gels and by Western blot. Both fragments were shown to bind both 
OC125 and Ml 1 antibodies, indicating both major classes of epitopes were preserved in the released 
peptides (Figure 1). 

Protein sequencing of the 40 kDa band yielded both amino terminal sequences and some 
internal sequences generated by protease digestion (Table 1 - SEQ ID NOS: 1-4). Insufficient yields 
of the 60 kDa band resulted in unreliable sequence information. Unfortunately, efforts to amplify 
PCR products utilizing redundant primers designed to these sequences were not successful. In mid 
2000, an EST (#BE005912) was entered into the GCG database, which contained homology to the 40 
kDa band sequence as shown in Table 1 (SEQ ID NOS: 5 and 6). The translation of this EST 
indicated good homology to the amino terminal sequence of the 40 kDa repeat (e.g. PGSRKFKTTE) 
with only one amino acid difference (i.e. an asparagine is present instead of phenylalanine in the EST 
sequence). Also, some of the internal sequences are partially conserved (e.g. SEQ ID NO: 2 and to a 
lesser extent, SEQ ID NO: 3 and SEQ ID NO: 4). More importantly, all the internal sequences are 
preceded by a basic amino acid (Table 1, indicated by arrows) appropriate for proteolysis by the 
trypsin used to create the internal peptides from the 40 kDa cyanogen bromide repeat. Utilizing the 
combined sequences, those obtained by amino acid sequencing and those identified in the EST 
(#BE005912) and a second EST (#AA640762) identified in the database, sense primers were created 
as follows: 5'-GGA GAG GGT TCT GCA GGG TC-3' (SEQ ID NO: 7) representing amino acids 
ERVLQG and anti-sense primer, 5' GTG AAT GGT ATC AGG AGA GG-3' (SEQ ID NO: 9) 
representing PLLIPF. Using PCR, the presence of transcripts was confirmed representing these 
sequences in ovarian tumors and their absence in normal ovary and either very low levels or no 
detectable levels in a mucinous tumor (Figure 2A). The existence of transcripts was further 
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confirmed in cDNA derived from multiple primary ovarian carcinoma cell lines and the absence of 
transcripts in matched lymphocyte cultures from the same patient (Figure 2B). 

After cloning and sequencing of the amplified 400 base pair PCR products, a series of 
sequences were identified, which had high homology to each other but which were clearly distinct 
i repeat entities (Figure 3) (SEQ ID NOS: 158 through 161). 

Examples of each category of repeats were sequenced, and the results are shown in Tables 
3, 4, and 5. The sequences represent amplification and sequence data of PCR products obtained 
using oligonucleotide primers derived from an EST (Genbank Accession No. BE005912). Table 
3 illustrates the amino acid sequence for a 400 bp repeat in the CA125 molecule, which is 
3 identified as SEQ ID NO: 1 1 through SEQ ID NO: 21 . Table 4 illustrates the amino acid 
sequence for a 800 bp repeat in the CA125 molecule, which corresponds to SEQ ID NO: 22 
through SEQ ID NO: 35. Table 5 illustrates the amino acid sequence for a 1200 bp repeat in the 
CA125 molecule, which is identified as SEQ ID NO: 36 through SEQ ID NO: 46. Assembly of 
these repeat sequences (which showed 75-80% homology to each other as determined by GCG 
Software (GCG = Genetics Computer Group) using the Pileup application) utilizing PCR 
amplification and sequencing of overlapping sequences allowed for the construction of a 9 repeat 
structure. The amino acid sequence for the 9 repeat is shown in Table 6 as SEQ ID NO: 47. The 
individual C-enclosures are highlighted in the table. 

Using the assembled repeat sequence in Table 6 to search genebank databases, a cDNA 
sequence referred to as Genbank Accession No. AK024365 (entered on 9/29/00) was discovered. 
Table 7 shows the amino acid sequence for AK024365, which corresponds to SEQ ID NO: 48. 
AK024365 was found to overlap with two repeats of the assembled repeat sequence shown in 
Table 6. Individual C-enclosures are highlighted in Table 7. 

The cDNA for AK024365 allowed alignment of four additional repeats as well as a 
25 downstream carboxy terminus sequence of the CA125 gene. Table 8 illustrates the complete 
DNA sequence of 13 repeats contiguous with the carboxy terminus of the CA125 molecule, 
which corresponds to SEQ ID NO: 49. Table 9 illustrates the complete amino acid sequence of 
the 13 repeats and the carboxy terminus of the CA125 molecule, which corresponds to SEQ ID 
NO: 50. The carboxy terminus domain was further confirmed by the existence of two EST's 
30 (Genbank Accession Nos. AW1 50602 and AI923224) in the genebank database, both of which 
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confirmed the stop-codon indicated ( TGA ) as well as the poly A signal sequence (AATAA) and 
the poly A tail (see Table 9). The presence of these repeats has been confirmed in serous ovarian 
tumors and their absence in normal ovarian tissue and mucinous tumors as expected (see Figure 
2A). Also, the transcripts for these repeats have been shown to be present in tumor cell lines 
5 derived from ovarian tumors, but not in normal lymphocyte cell lines (Figure 2B). Moreover, 
Northern blot analysis of mRNA derived from normal or ovarian carcinoma and probed with a 
P 32 labeled CA125 repeat sequence (as shown in Figure 6) confirmed the presence of an RNA 
transcript in excess of 20 kb in ovarian tumor extracts (see Figure 2B). 

To date, 45 repeat sequences have been identified with high homology to each other. To 
1 0 order these repeat units, overlapping sequences were amplified using a sense primer (5 ' GTC TCT 
ATG TCA ATG GTT TCA CCC-3') (SEQ ID NO: 305) from an upstream repeat and an antisense 
primer from a downstream repeat sequence (antisense 5' TAG CTG CTC TCT GTC CAG TCC-3') 
5 (SEQ ID NO: 306). Attempts have been made to place these repeats in a contiguous fashion as 
%3 shown in Figure 3. There is some potential redundancy. Further, there is evidence from overlapping 

fir! 

lj§j sequences that some repeats exist in more than one location in the sequence giving a total of more 

^ than 60 repeats in the CA125 molecule (see Table 21 SEQ ID NO: 162). 

m Final confirmation of the relationship of the putative CA125 repeat domain to the known 

h CA125 molecule was achieved by expressing a recombinant repeat domain in E. coli. In Figure 4, 

expression of a recombinant CA125 repeat domain is shown in lane 2 compared to the vector alone in 

2Qj lane 1, Panel D. A series of Western blots representing E. coli extracts of vector alone in lane 1 ; 

n CA125 recombinant protein lane in 2 and recombinant TADG-14 (an unrelated recombinant 

pro 

protease), lane 3, were probed with the CA125 antibodies Mil, Panel A; OC125, Panel B; and 
ISOBM 9.2, Panel C. In all cases, CA125 antibodies recognized only the recombinant CA125 
antigen (lane 2 of each panel). 

25 To further characterize the epitope location of the CA125 antibodies, recombinant CA125 

repeat was digested with the endoprotease Lys-C and separately with the protease Asp-N. In both 
cases, epitope recognition was destroyed. As shown in Figure 5, the initial cleavage site for ASP-N 
is at amino acid #76 (indicated by arrow in Figure 5C). This sequence (amino acids # 1-76), a 17 
kDa band, was detected with anti-histidine antibodies (Figure 5A,Lane 3) and found to have no 

30 capacity to bind CA125 antibodies (Figure 5B, Lane 3). The upper bands in Figures 5 A and 5B 

represent the undigested remaining portion of the CA125 recombinant repeat. From these data, one 
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can reasonably conclude that epitopes are either located at the site of cleavage and are destroyed by 
Asp-N or are downstream from this site and also destroyed by cleavage. Likewise, cleavage with 
Lys-C would result in a peptide, which includes amino acids # 68-154 (Figure 5C) and again, no 
antibody binding was detected. In view of the foregoing, it seems likely that epitope binding resides 
5 in the cysteine loop region containing a possible disulfide bridge (amino acids # 59-79). Final 
confirmation of epitope sites are being examined by mutating individual amino acids. 

To determine transcript size of the CA125 molecule, Northern blot analysis was performed on 
mRNA extracts from both normal and tumor tissues. In agreement with the notion that CA125 may 
be represented by an unusually large transcript due to its known mega dalton size in tumor sera, 
10 ascites fluid, and peritoneal fluid [Nustad K et al, CA125 - epitopes and molecular size, Int. J of 
Biolog. Markers, 13(4)196-199 (1998)], a transcript was discovered which barely entered the gel 
from the holding well (Figure 6). CA125 mRNA was only present in the tumor RNA sample and 
Q while a precise designation of its true size remains difficult due to the lack of appropriate standards, 
S its unusually large size would accommodate a protein core structure in excess of 1 1 ,000 amino acids. 
# Evidence demonstrates that the repeat domain of the CA125 molecule encompasses a 

' J minimum of 45 different 1 56 amino acid repeat units and possibly greater than 60 repeats, as 

individual repeats occur more than once in the sequence. This finding may well account for the 
» extraordinary size of the observed transcript. The amino acid composition of the repeat units (Figure 
I jj 7 A, 7C, Table 2 1 ) indicates that the sequence is rich in serine, threonine, and proline typical of the 
2$! high STP repeat regions of the mucin genes [Gum Jr., JR, Mucin genes and the proteins they encode: 
□ Structure, diversity and regulation, Am JRespir. Cell Mol. Biol. 7:557-564 (1992)]. Results suggest 
that the downstream end of the repeat is heavily glycosylated. 

Also noteworthy is a totally conserved methionine at position 24 of the repeat (Figure 7 A, 
7C). It is this methionine which allowed cyanogen bromide digestion of the CA125 molecule, 
25 resulting in the 40 kDa glycopeptide that was identified with OC125 and Ml 1 antibodies in Western 
blots of the CNBr digested peptides. These data predict that the epitopes for the CA125 antibodies 
are located in the repeat sequence. By production of a recombinant product representing the repeat 
sequence, results have confirmed this to be true. A potential disulfide bond is noted, which would 
encompass a C-enclosure comprising 19 amino acids enclosed by two cysteines at positions #59 and 
30 #79. The cysteines are totally conserved, which suggest a biological role for the resulting putative C- 
enclosure in each repeat. As mentioned above, it is likely that the OC125 and Ml 1 epitopes are 
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located in the C-enclosure, indicating its relative availability for immune detection. This is probably 
due to the C-enclosure structure and the paucity of glycosylation in the immediate surrounding areas. 
Domain searches also suggest some homology in the repeat domain to an SEA domain commonly 
found in the mucin genes [Williams SJ et al, MUC13, a novel human cell surface mucin expressed 
5 by epithelial and hemopoietic cells, J of Biol. Chem 276(21)18327-18336 (2001)] beginning at amino 
acid #1 and ending at #131 of each repeat. No biological function has been described for this 
domain. 

Based on homology of the repeat sequences to chromosome 19q 13.2 (cosmid #AC008734) 
and confirmed by genomic amplification, it has been established that each repeat is comprised of 5 
10 exons (covering approximately 1900 bases of genomic DNA): exon 1 comprises 42 amino acids (#1- 
42); exon 2 comprises 23 amino acids (#43-65); exon 3 comprises 58 amino acids (#66-123); exon 4 
comprises 12 amino acids (#124-135); and exon 5 comprises 21amino acids (#136-156) (see Figure 
7B). Homology pile-ups of individual exons have also been completed (see Figure 7C), which 
indicates that exon 1 has a minimum of 31different copies of the exon; exon 2 has 27 copies; exon 3 
# has 28 copies, exon 4 has 28 copies and exon 5 has 2 1 copies. If all exons were only found in a 

single configuration relative to each other, one could determine that a minimum number of repeats of 
31 were present in the CA125 molecule. Using the exon 2 pile-up data as an example, it has been 
established as mentioned above that there are 27 individual exon 2 sequences. Using exon 2, which 
was sequenced fully in both the repeat units and the overlaps, results established that a minimum of 
id 45 repeat units are present when exon 2 is combined with unique other exon combinations. However, 
P based on overlap sequence information, 60+ repeat units are likely present in the CA1 25 molecule 
(Table 21). This larger number of repeat units can be accounted for by the presence of the same 
repeat unit occurring in more than one location. 

Currently, the repetitive units of the repeat domain of the CA125 molecule constitute the 
25 majority of its extracellular molecular structure. These sequences have been presented in a tandem 
fashion based on overlap sequencing data. Some sequences may be incorrectly placed and some 
repeat units may not as yet be identified (Table 21). More recently, an additional repeat was 
identified in CA125 as shown in Tables 22 and 23 (SEQ. ID NOS: 307 and 308). The exact position 
has not yet been identified. Also, there is a potential that alternate splicing and/or mutation could 
30 account for some of the repeat variants that are listed. Studies are being conducted to compare both 
normal tissue derived CA125 repeats to individual tumor derived CA125 repeats to determine if such 
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variation is present. Currently, the known exon configurations would easily accommodate the greater 
than 60 repeat units as projected. It is, therefore, unlikely that alternate splicing is a major 
contributor to the repetitive sequences in CA125. It should also be noted that the genomic database 
for chromosome 19q 13.2 only includes about 10 repeat units, thus indicating a discrepancy between 
5 the data of the present invention (more than 60 repeats) and the genomic database. A recent 

evaluation of the methods used for selection and assembly for genomic sequence [Marshall E, DNA 
Sequencing: Genome teams adjust to shotgum marriage, Science 292:1982-1983 (2001)] reports that 
"more research is needed on repeat blocks of almost identical DNA sequence which are more 
common in the human genome. Existing assembly programs can't handle them well and often delete 
10 them." The CA125 repeat units located on chromosome 19 may well be victims of deletion in the 
genomic database, thus accounting for most CA125 repeat units absent from the current databases. 
A. Sequence Confirmation and Assembly of the Amino Terminal Domain (Domain 1) of the 
CA125 Molecule 

As previously mentioned, homology for repeat sequences was found in the chromosome 19 
cosmid AC008734 of the GCG database. This cosmid at the time consisted of 35 unordered contigs. 
After searching the cosmid for repeat sequences, contig #32 was found to have exons 1 and 2 of a 
repeat unit at its 3' end. Contig #32 also had a large open reading frame upstream from the two 
repeat units, which suggested that this contig contained sequences consistent with the amino terminal 
end of the CA125 molecule. A sense primer was synthesized to the upstream non-repeat part of 
contig #32 coupled with a specific primer from within the repeat region (see Methods). PCR 
amplification of ovarian tumor cDNA confirmed the contiguous positioning of these two domains. 

The PCR reaction yielded a band of approximately 980bp. The band was sequenced and 
found to connect the upstream open reading frame to the repeat region of CA125. From these data, 
more primer sets (see Methods) were synthesized and used in PCR reactions to piece together the 
25 entire open reading frame contained in contig #32. To find the 5' most end of the sequence, an EST 
(AU1 33673) was discovered, which linked contig #32 to contig #7 of the same cosmid. Specific 
primers were synthesized, (5 ' -CTGATGGC ATTATGGAAC AC ATC AC-3 ' (SEQ ID NO: 59) and 
5 ' -CCC AGAACGAGAGACC AGTGAG-3 ' (SEQ ID NO: 60)), to the EST and contig #32. A PCR 
reaction was performed to confirm that part of the EST sequence was in fact contiguous with contig 
30 #32. Confirmation of this contiguous 5 ' prime sequencing strategy using overlapping sequences 
allowed the assembly of the 5' region (Domain 1) (Figure 8A). 5' RACE PCR was performed on 
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tumor cDNA to confirm the amino terminal sequence to CA125. The test confirmed the presence of 
contig #7 sequence at the amino terminal end of CA125. 

The amino terminal domain comprises five genomic exons covering approximately 13,250 bp. 
Exon 1, a small exon, (amino acids #1-33) is derived from contig #7 (Figure 8A). The remaining 
5 exons are all derived from contig #32: Exon 2 (amino acids #34-1593), an extraordinarily large exon, 
Exon 3 (amino acids #1594-1605), Exon 4 (amino acids #1606-1617) and Exon 5 (amino acids 
#1618-1637) (see Figure 8A). 

Potential N-glycosylation sites marked (x) are encoded at positions #81, #271, #320, #624, 
#795, #834, #938, and #1,165 (see Figure 8B). O-glycosylation sites are extraordinarily abundant 
1 0 and essentially cover the amino terminal domain (Figure 8B). As shown by the O-glycosylation 
pattern, Domain 1 is highly enriched in both threonine and serine (Figure 8B). 
B. Sequence Confirmation and Assembly of the CA125 Carboxy Terminal End (Domain 3) 
U A search of Genbank using the repeat sequences described above uncovered a cDNA 

3 sequence referred to as Genbank accession number AK024365 . This sequence was found to have 2 
l[J repeat sequences, which overlapped 2 known repeat sequences of a series of 6 repeats. As a result, 
M the cDNA allowed the alignment of all six carboxy terminal repeats along with a unique carboxy 
f| terminal sequence. The carboxy terminus was further confirmed by the existence of two other ESTs 
(Genbank accession numbers AW1 50602 and Al 923224), both of which confirmed a stop codon as 
3 well as a poly-A signal sequence and a poly-A tail (see GCG database #AF414442). The sequence of 
i§ the carboxy terminal domain was confirmed using primers designed to sequence just downstream of 
P the repeat domain (sense primer 5' GGA CAA GGT CAC CAC ACT CTA C-3') (SEQ ID NO: 303) 
and an antisense primer (5'-GCA GAT CCT CCA GGT CTA GGT GTG-3') (SEQ ID NO: 304) 
designed to carboxy terminus (Figure 9A). 

The carboxy terminal domain covers more than 14,000 genomic bp. By ligation, this domain 
25 comprises nine exons as shown in Figure 9A. The carboxy-terminus is defined by a 284 amino acid 
sequence downstream from the repeat domains (see Figure 9B). Both N-glycosylation sites marked 
(x) (#31, #64, #103, #140, #194, #200) and a small number of O-glycosylation sites marked (o) are 
predicted for the carboxy end of the molecule (Figures 9A, 9B). Of special note is a putative 
transmembrane domain at positions #230-#252 followed by a cytoplasmic domain, which is 
30 characterized by a highly basic sequence adjacent to the membrane (#256-#260) as well as several 
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potential S/T phosphorylation sites (#254, #255, #276) and tyrosine phosphorylation sites (at # 264, 
#273, #274) (Figures 9A, 9B). 

Assembly of the CA125 molecule as validated by PCR amplification of overlap sequence 
provides a picture of the whole molecule (see Figure 10 and Table 21). The complete nucleotide 
5 sequence is available in Genebank, Accession #AF4 1 4442 and the amino acid sequence as currently 
aligned is shown in Table 21 . 

DISCUSSION 

The CA125 molecule comprises three major domains; an extracellular amino terminal domain 
(Domain 1), a large multiple repeat domain (Domain 2) and a carboxy terminal domain (Domain 3), 
10 which includes a transmembrane anchor with a short cytoplasmic domain (Figure 10). The amino 
terminal domain is assembled by combining five genomic exons, four very short amino terminal 
sequences and one extraordinarily large exon, which often typifies mucin extracellular glycosylated 

□ domains [Desseyn JL et al. , Human mucin gene MUC5B, the 1 0.7-kb large central exon encodes 

: ft 

Ilj various alternate subdomains resulting in a super-repeat. Structural evidence for a 1 lpl 5.5 gene 
1? family J Biol. Chem. 272(6):3 168-31 78 (1997)]. This domain is dominated by its capacity for O- 

H glycosylation and its resultant richness in serine and threonine residues. Overall, the potential for O- 

fi glycosylation essentially covers this domain and, as such, may allow the carbohydrate superstructure 
to influence ECM interaction at this end of the CA125 molecule (Figure 8). There is one short area 

5 (amino acids # 74- 1 20) where little or no glycosylation is predicted, which could allow for protein- 
ic protein interaction in the extracellular matrix. 

□ Efforts to purify CA1 25 over the years were obviously complicated by the presence of this 
amino terminal domain, which is unlikely to have any epitope sites recognized by the OC 1 25 or M 1 1 
class antibodies. As the CA125 molecule is degraded in vivo, it is likely that this highly glycosylated 
amino terminal end will be found associated with varying numbers of repeat units. This could very 

25 well account for both the charge and size heterogeneity of the CA125 molecule so often identified 

from serum and ascites fluid. Also of note are two T-TALK sequences at amino acids # 45-58 

(underlined in Figure 8B), which are unique to the CA125 molecule. 

The extracellular repeat domain, which characterizes the CA125 molecule, also represents a 

major portion of the molecular structure. It is downstream from the amino terminal domain and 
30 presents itself in a much different manner to its extracellular matrix neighbors. These repeats are 

characterized by many features including a highly-conserved nature (Figure 3) and a uniformity in 
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exon structure (Figure 7). But most consistently, a cysteine enclosed sequence may form a cysteine 
loop (Table 21). This structure may provide extraordinary potential for interaction with neighboring 
matrix molecules. Domain 2 encompasses the 156 amino acid repeat units of the CA125 molecule. 
The repeat domain constitutes the largest proportion of the CA125 molecule (Table 21 and Figure 
5 10). Because it has been known for more than 1 5 years that antibodies bind in a multivalent fashion 
to CA125, it has been predicted that the CA125 molecule would include multiple repeat domains 
capable of binding the OC125 and Ml 1 class of sentinel antibodies which define this molecule 
[O'Brien et al, New monoclonal antibodies identify the glycoprotein carrying the CA125 epitope, 
AmJObstet Gynecol. 165:1857-1964 (1991); Nustad K et al, Specificity and affinity of 26 
10 monoclonal antibodies against the CA125 antigen: First report from the ISOBM TD-1 workshop, 
Tumor Biology 17:196-219 (1996); and Bast RC et al, A radioimmunoassay using a monoclonal 
antibody to monitor the course of epithelial ovarian cancer, N. Engl. J. Med. 309:883-887 (1983)]. In 
the present invention, more than 60 repeat units have been identified, which are in tandem array in 
the extracellular portion of the CA125 molecule. Individual repeat units have been confirmed by 
if sequencing and further identified by PCR amplification of the overlapping repeat sequences. Results 
confirm the contiguous placement of most repeats relative to its neighbor (Table 21). 

Initial evidence suggests that this area is a potential site for antibody binding and also for 
ligand binding. The highly conserved methionine and several highly conserved sequences within the 
repeat domain also suggests a functional capacity for these repeat units. The extensive glycosylation 
M of exons 4 & 5 of the repeat unit and the N-glycosylation potential in exon 1 and the 5' end of exon 2 
□ might further point to a functional capacity for the latter part of exon 2 and exon 3 which includes the 
ftsS C-enclosure (see Figure 7). It should be apparent that the C-enclosure might be a prime target for 
protease activity and such cleavage may well explain the difficulty experienced by many 
investigators in obtaining an undigested CA125 parent molecule. Such activity might explain the 
25 diffuse pattern of antibody binding and the loss of antibody binding for molecules of less than 

200,000 kDa. Proteolysis would destroy the epitopes and, therefore, only multiple repeats could be 
identified by blotting with CA125 antibodies. The repeat unit organization also suggests the potential 
for a multivalent interaction with extracellular entities. 

The carboxy terminal domain of the CA125 molecule comprises an extracellular domain, 
30 which does not have any homology to other known domains. It encodes a typical transmembrane 
domain and a short cytoplasmic tail. It also contains a proteolytic cleavage site approximately 50 
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amino acids upstream from the transmembrane domain. This would allow for proteolytic cleavage 
and release of the CA125 molecule (Figure 9). As indicated by Fendrick, et al [CA125 
phosphorylation is associated with its secretion from the WISH human amnion cell line, Tumor 
Biology 18:278-289 (1997)], release of the CA125 molecule is preceded by phosphorylation and 
sustained by inhibitors of phosphatases, especially inhibition of phosphatase 2B. The cytoplasmic 
tail which contains S/T phosphorylation sites next to the transmembrane domain and tyrosine 
phosphorylation sites downstream from there could accommodate such phosphorylation. A very 
distinguishable positively charged sequence is present upstream from the tyrosine, suggesting a signal 
transduction system involving negatively charged phosphate groups and positively charged lysine and 
arginine groups. 

These features of the CA125 molecule suggest a signal transduction pathway involvement in 
the biological function of CA125 [Fendrick JL et al, CA125 phosphorylation is associated with its 
secretion from the WISH human amnion cell line, Tumor Biology 18:278-289 (1997); and Konish I et 
al, Epidermal growth factor enhances secretion of the ovarian tumor-associated cancer antigen 
CA125 from the human amnion WISH cell line, JSoc. Gynecol Invest. 1:89-96 (1994)]. It also 
reinforces the prediction of phosphorylation prior to CA125 release from the membrane surface as 
previously proposed [Fendrick JL et al, CA125 phosphorylation is associated with its secretion from 
the WISH human amnion cell line, Tumor Biology 1 8:278-289 (1997); and Konish I et al, Epidermal 
growth factor enhances secretion of the ovarian tumor-associated cancer antigen CA125 from the 
human amnion WISH cell line, J Soc. Gynecol Invest. 1:89-96(1994)]. Furthermore, a putative 
proteolytic cleavage site on the extra-cellular side of the transmembrane domain is present at position 
#176-181. 

How well does the CA125 structure described in the present invention compare to the 
previously known CA125 structure? O'Brien et al reported that a number of questions needed to be 
addressed: 1) the multivalent nature of the molecule; 2) the heterogeneity of CA125; 3) the 
carbohydrate composition; 4) the secretory or membrane bound nature of the CA125 molecule; 5) the 
function of the CA125 molecule; and 6) the elusive CA125 gene [More than 15 years of CA125: 
What is known about the antigen, its structure and its function, Int J Biological Markers 13(4)188- 
195 (1998)]. Several of these questions have been addressed in the present invention including, of 
course, the gene and its protein core product. Perhaps, most interestingly is the question of whether 
an individual large transcript accounted for the whole CA125 molecule, or a number of smaller 
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transcripts which represented sub units that specifically associated to produce the CA125 molecule. 
From the results produced by way of the present invention, it is now apparent that the transcript of 
CA125 is large - similar to some of the mucin gene transcripts e.g. MUC 5B [see Verma M et al, 
Mucin genes: Structure, expression and regulation, Glycoconjugate J. 1 1 : 1 72- 1 79 ( 1 994); and 
Gendler SJ et al, Epithelial mucin genes, Annu. Rev. Physiol. 57:607-634 (1995)]. The protein core 
extracellular domains all have a high capacity for O-glycosylation and, therefore, probably accounts 
for the heterogeneity of charge and size encountered in the isolation of CA125. The data also 
confirm the O-glycosylation inhibition data, indicating CA125 to be rich in O-glycosylation [Lloyd 
KO et al, Synthesis and secretion of the ovarian cancer antigen CA125 by the human cancer cell line 
NIH: OVCAR-3, Tumor Biology 22, 77-82 (2001); Lloyd KO et al, Isolation and characterization of 
ovarian cancer antigen CA125 using a new monoclonal antibody (VK-8): Identification as a mucin- 
type molecule, Int. J. Cancer, 71:842-850 (1997); and Fendrick JL et al, Characterization of CA125 
synthesized by the human epithelial amnion WISH cell line, Tumor Biology 14:310-318 (1993)]. 

The repeat domain which includes more than 60 repeat units accounts for the multivalent 
nature of the epitopes present, as each repeat unit likely contains epitope binding sites for both 
OC125-like antibodies and Ml 1 -like antibodies. The presence of a transmembrane domain and 
cleavage site confirms the membrane association of CA125, and reinforces the data which indicates a 
dependence of CA125 release on proteolysis. Also, the release of CA125 from the cell surface may 
well depend on cytoplasmic phosphorylation and be the result of EGF signaling [Nustad K et al, 
Specificity and affinity of 26 monoclonal antibodies against the CA125 antigen: First report from the 
ISOBM TD-1 workshop, Tumor Biology 17:196-219 (1996)]. As for the question of inherent 
capacity of CA125 for proteolytic activity, this does not appear to be the case. However, it is likely 
that the associated proteins isolated along with CA125 (e.g. the 50 kDa protein which has no 
antibody binding ability) may have proteolytic activity. In any case, proteolysis of an extracellular 
cleavage site is the most likely mechanism of CA125 release. Such cleavage would be responsive to 
cytoplasmic signaling and mediated by an associated extracellular protease activity. 

In summary, the large number of tandem repeats of the CA125 molecule, which dominate its 
molecular structure and contain the likely epitope binding sites of the CA125 molecule, was 
unexpected. Also, one cannot as yet account for the proteolytic activity, which has plagued the 
isolation and characterization of this molecule for many years. While no protease domain per se is 
constituitively part of the CA125 molecule, there is a high likelihood of a direct association by an 
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extracellular protease with the ligand binding domains of the CA125 molecule. Finally, what is the 
role of the dominant repeat domain of this extracellular structure? Based on the expression data of 
CA125 on epithelial surfaces and in glandular ducts, it is reasonable to conclude that the unique 
structure of these repeat units with their cysteine loops plays a role both as glandular anti-invasive 
5 molecules (bacterial entrapment) and/or a role in anti-adhesion (maintaining patency) between 
epithelial surfaces and in ductal linings. 

Recently, Yin and Lloyd described the partial cloning of the CA125 antigen using a 
completely different approach to that described in the present invention [Yin TWT et al, Molecular 
cloning of the CA125 ovarian cancer antigen. Identification as a new mucin (MUC16), J Biol. Chem. 
10 276:27371-27375 (2001)]. Utilizing a polyclonal antibody to CA125 to screen an expression library 
of the ovarian tumor cell line OVCAR-3, these researchers identified a 5965 bp clone containing a 
stop codon and a poly A tail, which included nine partially conserved tandem repeats followed by a 

□ potential transmembrane region with a cytoplasmic tail. The 5965 bp sequence is almost completely 
% homologous to the carboxy terminus region shown in Table 2 1 . Although differing in a few bases, 

# the sequences are homologous. As mentioned above, the cytoplasmic tail has the potential for 

in 

Cj phosphorylation and a transmembrane domain would anchor this part of the CA125 molecule to the 

'% surface of the epithelial or tumor cell. In the extracellular matrix, a relatively short transition domain 

* connects the transmembrane anchor to a series of tandem repeats - in the case of Yin and Lloyd, nine. 

n 

^ By contrast, the major extracellular part of the molecule of the present invention as shown is 

ill upstream from the sequence described by Yin and includes a large series of tandem repeats. These 

'""-4 

□ results, of course, provide a different picture of the CA125 molecule, which suggest that CA125 is 
r "" dominated by the series of extracellular repeats. Also included is a major amino terminal domain 

(-1638 amino acids) for the CA125 molecule, which it is believed accounts for a great deal of the 0- 
glycosylation known to be an important structural component of CA125. 

25 In conclusion, a CA125 molecule is disclosed which requires a transcript of more than 35,000 

bases and occupies approximately 150,000 bp on chromosome 19q 13.2. It is dominated by a large 
series of extracellular repeat units (156 amino acids), which offer the potential for molecular 
interactions especially through a highly conserved unique cysteine loop. The repeat units also 
include the epitopes now well-described and classified for both the major class of CA125 antibodies 

30 (i.e., the OC125 and the Ml 1 groups). The CA125 molecule is anchored at its carboxy terminal 
through a transmembrane domain and a short cytoplasmic tail. CA125 also contains a highly 
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glycosylated amino terminal domain, which includes a large extracellular exon typical of some 
mucins. Given the massive repeat domain presence of both epithelial surfaces and ovarian tumor cell 
surfaces, it might be anticipated that CA125 may play a major role in determining the extracellular < 
environment surrounding epithelial and tumor cells. 
5 Advantages and Uses of the CA125 Recombinant Products 

1) Current assays to CA125 utilize as standards either CA125 produced from cultured cell 
lines or from patient ascites fluid. Neither source is defined with regard to the quality or purity of 
the CA125 molecule. Therefore arbitrary units are used to describe patient levels of CA125. 
Because cut-off values are important in the treatment of patients with elevated CA125 and 

10 because many different assay systems are used clinically to measure CA125, it is relevant and 
indeed necessary to define a standard for all CA125 assays. Recombinant CA125 containing 
epitope binding sites could fulfill this need for standardization. Furthermore, new and more 

Q 

In specific assays may be developed utilizing recombinant products for antibody production. 

2) Vaccines: Adequate data now exists [see Wagner U et al., Immunological 
M consolidation of ovarian carcinoma recurrences with monoclonal anti-idiotype antibody 

j,y ACA125: Immune responses and survival in palliative treatment, Clin. Cancer Res. 7:1112-1115 
^ (2001)], which suggest and support the idea that CA125 could be used as a therapeutic vaccine to 
^ treat patients with ovarian carcinoma. Heretofore, in order to induce cellular and humoral 
flj immunity in humans to CA125, murine antibodies specific for CA125 were utilized in 
^| anticipation of patient production of anti-ideotypic antibodies, thus indirectly allowing the 

induction of an immune response to the CA125 molecule. With the availability of recombinant 
CA125, especially domains which encompass epitope binding sites for known murine antibodies 
and domains directly anchoring CA125 on the tumor cell, it will be feasible to more directly 
stimulate patients 5 immune systems to CA125 and as a result, extend the life of ovarian 
25 carcinoma patients as demonstrated by Wagner et al. 

Several approaches can be utilized to achieve such a therapeutic response in the immune 
system by: 1) directly immunizing the patient with recombinant antigen containing the CA125 
epitopes or other domains; 2) harvesting dendritic cells from the patient; 3) expanding these cells 
in in vitro culture; 4) activating the dendritic cells with the recombinant CA125 epitope domain 
30 or other domains or with peptides derived from these domains [see Santin AD et al., Induction of 
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ovarian tumor-specific CD8+ cytotoxic T lymphocytes by acid-eluted peptide-pulsed autologous 
dendritic cells, Obstetrics & Gynecology 96(3):422-430 (2000)]; and then 5) returning these 
immune stem cells to the patient to achieve an immune response to CA125. This procedure can 
also be accomplished using specific peptides which are compatible with histocompatibility 
5 antigens of the patient. Such peptides compatible with the HLA-A2 binding motifs common in 
the population are indicated in Figure 12. 

3) Therapeutic Targets: Molecules, which are expressed on the surface of tumor cells as 
CA125 is, offer potential targets for immune stimulation, drug delivery, biological modifier 
delivery or any agent which can be specifically delivered to ultimately kill the tumor cells. 
10 CA125 offers such potential as a target: 1) Antibodies to CA125 epitopes or newly described 
potential epitopes: Most especially humanized or human antibodies to CA125 which could 
directly activate the patients' immune system to attack and kill tumor cells. Antibodies could be 

pas;, 

yn used to deliver all drug or toxic agents including radioactive agents to mediate direct killing of 
*s tumor cells. 2) Natural ligands: Under normal circumstances, molecules are bound to the CA125 
li'l molecule e.g. a 50 k dalton protein which does not contain CA125 epitopes co-purifies with 

hi CA125. Such a molecule, which might have a natural binding affinity for domains on the CA125 

fn 

I* molecule, could also be utilized to deliver therapeutic agents to tumor cells. 

^ 4) Anti-sense therapy: CA125 expression may provide a survival or metastatic advantage 

%y 

fU to ovarian tumor cells as such antisense oligonucleotide derived from the CA125 sequence could 

'""4 

2Q3 be used to down-regulate the expression of CA125. Antisense therapy could be used in 
^ association with a tumor cell delivery system such as described above. 

5) Small Molecules: Recombinant domains of CA125 also offer the potential to identify 
small molecules which bind to individual domains of the molecule. Small molecules either from 
combinatorial chemical libraries or small peptides can also be used as delivery agents or as 
25 biological modifiers. 

All references referred to herein are hereby incorporated by reference in their entirety. 
It should be understood that various changes and modifications to the presently preferred 
embodiments described herein will be apparent to those skilled in the art. Such changes and 
modifications can be made without departing from the spirit and scope of the present invention 
30 and without diminishing its attendant advantages. 
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TABLE 1 



(SEQ ID NO: 5 and SEQ ID NO: 6, respectively) 



anvna Nterm - [OHPGSRKFKTTEq (SEQ ID NO: 1) 



Peak 68 - FLTVERVLQGL (SEQ ID NO: 2) 

Peak 65 - DTYVGPLY (SEQ ID NO: 3) 

Peak 30 - DGAANGVD (SEQ ID NO: 4) 



■ ■ 



(SEQ ID NO: 5 and SEQ ID NO: 6) 



1 CGTCGACCTGGCT CTAGAAAGT T T AACAC CAC GGAGAG AGT CCT TCAGGGTCT GCT CAGG 
R R ] P G 5 R K F HT TEJRVLQGL L R 

61 CCTGTGTTCAAGAACACCAGTGTTGGCCCTCTGTACTCTGGCTGCAGACTGACCTTGCTC 
p V F K MT SVGPLY SGCRLTLL 

121 AGGCC GAAGAAGGAT GGGGC AGC CACCAAAGT GGATGC CAT CT GCAC CT ACC GCC CT GAT 
R P KKD^G^A^T^ K^DA I C T Y R P D 

181 C CCAAAAGCCC T GGACT GGACAGAGAGC AGC T AT ACT GGGAGC T GAGCC AGGGT GAT GC A 
PKSPGLDREQLYWELSQGDA 
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TABLE 2A 



Nucleotide and Amino Acid Sequences for Sense Primer 5' 3' (SEQ ID NO: 7 and 
SEQ ID NO: 8 respectively) and Antisense Primer 5" 3' 
(SEQ ID NO: 9 and SEQ ID NO: 10 respectively) based upon Regions of Homology for 
EST Genbank Accession Nos . BE005912 and AA640762) 



GGA GAG GGT TCT GCA GGG TC (SEQ ID NO: 7) 

E R V L Q G (SEQ ID NO: 8) 

GTG AAT GGT ATC AGG AG A GG (SEQ ID NO: 9) 

P L L I P F (SEQ ID NO: 10) 



TABLE 2B 



Sense and Anti-Sense Primers Used for Ordering Repeat Units 
(SEQ ID NO: 301 and SEQ ID NO: 302 # respectively) 



5-GTCTCTATGTCAATGGTTTCACCC-3' (SEQ ID NO: 301) 

5'-TAGCTGCTCTCTGTCCAGTCC-3 , (SEQ ID NO: 302) 
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TABLE 3 



Amino Acid Sequence for a 400 bp Repeat in the CA125 Molecule 
(SEQ ID NO: 11 thru SEQ ID NO: 21) 



10 



15 



t :I5 



12 


1 

ERVLQGLLRS 


LFKSTSVGPL 


YSGCRLTLLR 


PEKDGTATGV 


^ 0 

DAICTHHPDP 


(SEQ 


ID 


NO: 


11) 


34 


ERVLQGLLMP 


LFKNTSVSSL 


YSGCRLTLLR 


PEKDGAATRA 


DAVCTHRPDP 


(SEQ 


ID 


NO: 


12) 


32 


ERVLQGLLGP 


IFKNTSVGPL 


YSGCRLTSLR 


SEKDGAATGV 


DAICIHRLDP 


(SEQ 


ID 


NO: 


13) 


46 


ERVLQGLLGP 


MFKNTSVGLL 


YSGCRLTLLR 


PEKNGAATGM 


DAICSHRLDP 


(SEQ 


ID 


NO: 


14) 


33 


ERVLQGLLGP 


LFKNSSVGPL 


YSGCRLISLR 


SEKDGAATGV 


DAICTHHLNP 


(SEQ 


ID 


NO: 


15) 


15 


ERVLQGLLRP 


LFKSTSAGPL 


YSGCRLTLLR 


PEKHGAATGV 


DAICTLRLDP 


(SEQ 


ID 


NO: 


16) 


35 


ERVLQGLLKP 


LFKSTSVGPL 


YSGCRLTLLR 


PEKRGAATGV 


DTICTHRLDP 


(SEQ 


ID 


NO: 


17) 


111 


ERVLQGLLTP 


LFKNTSVGPL 


YSGCRLTLLR 


PEKQEAATGV 


DTICTHRVDP 


(SEQ 


ID 


NO: 


18) 


42 


ERVLQGLLKP 


LFKNTSVGPL 


YSGCRLTLLR 


PEKHEAATGV 


DTICTHRLDP 


(SEQ 


ID 


NO: 


19) 


116 


ERVLQGLLSP 


IFKNSSVGPL 


YSGCRLTSLR 


PEKDGAATGM 


DAVCLYHPNP 


(SEQ 


ID 


NO: 


20) 


23 


ERVLQGLLRP 


LFKNTSIGPL 


YSSCRLTLLR 


PEKDKAATRV 


DAICTHHPDP 


(SEQ 


ID 


NO: 


21) 



20 

51 100 

12 KSPRLDREQL YWELSQLTHN ITELGPYALD NDSLFVNGFT HRSSVSTTST 

34 KSPGLDRERL YWKLSQLTHG ITELGPYTLD RHSLYVNGFT HQSSMTTTRT 

O 32 KSPGLNREQL YWELSKLTND IEELGPYTLD RNSLYVNGFT HQSSVSTTST 

253 4 6 KSPGLNREQL YWELSQLTHG IKELGPYTLD RNSLYVNGFT HRSSVAPTST 

i,p 3 3 QSPGLDREQL YWQLSQMTNG IKELGPYTLD RNSLYVNGFT HRSSGLTTST 

15 TGPGLDRERL YWELSQLTNS VTELGPYTLD RDSLYVNGFT HRSSVPTTSI 

3 5 LNPGLDREQL YWELSKLTRG IIELGPYTLD RDSLYVNGFT HRSSVPTTSI 

111 IGPGLDRERL YWELSQLTNS ITELGPYTLD RDSLYVDGFN PWSSVPTTST 

3QJ 42 LNPGLDREQL YWELSKLTRG IIELGPYLLD RGSLYVNGFT HRNFVPITST 

116 KRPGLDREQL YWELSQLTHN ITELGPYSLD RDSLYVNGFT HQNSVPTTST 

2 3 QSPGLNREQL YWELSQLTHG ITELGPYTLD RDSLYVDGFT HWSPIPTTST 



£3 101 150 

350 12 PGTPTVYLGA SKTPASIFGP S . . AASPLLI PFT 

fy 34 PDTSTMHLAT SRTPASLSGP T . . TASPLLI PF 

3 2 PGTSTVDLRT SGTPSSLSSP TIMAAGPLLI PF 

q 46 PGTSTVDLGT SGTPSSLPSP T . . TAVPLLI PF 

3 3 PWTSTVDLGT SGTPSPVPSP T . . TAGPFLI PF 

4(T 15 PGTSAVHLET SGTPASLPGH T..APGPLLI PF 

3 5 PGTSAVHLET SGTPASLPGH I . . VPGPLLI PF 

111 PGTSTVHLAT SGTPSPLPGH T . . APVPLLI PFT 

42 PGTSTVHLGT SETPSSLPRP I . . VPGPLLV PFT 

116 PGTSTVYWAT TGTPSSFPGH T . . EPGPLLI PF 

45 23 PGTSIVNLGT SGIPPSLPET T..ATGPLLI PFT 
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TABLE 3 -continued 



Amino Acid Sequence for a 400 bp Repeat in the CA125 Molecule 
(SEQ ID NO: 11 thru SEQ ID NO: 21) 



151 170 
10 12 

34 

32 

46 

33 

15 15 

35 

HI 

42 

116 

20 23 



Us? 
■sssf 

. n 

if] 

ill 

Id 



ru 



□ 
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TABLE 4 



Amino Acid Sequence for a 800 bp Repeat in the CA125 Mole 
(SEQ ID NO: 22 thru SEQ ID NO: 35) 



50 



YSGCRLASLR 


T\ T"i T/"T\ nPAMAU 

PEKDSoAMAV 


pi a TPTHD PUP 


( SEO 


ID 


NO: 


22) 


YSGCRLASLR 


PEKDSSAMAV 


DAICTHRPDP 


(SEQ 


ID 


NO: 


23) 


YSGCRLTLLR 


P h IS-Ko AA 1 Lj V 


HT T f 1 TjTR TjD P 
JL/ 1 J- V— J. jn.iN.uxj c 


(SEQ 


ID 


NO: 


24) 


YSGCRLTLLR 


PEKRGAATGV 


DTICTHRLDP 


(SEQ 


ID 


NO: 


25) 


YSCjCKL 1 LiIjK 


ir Hi JXKo-H-H. 1 Vj V 


DTICTHRLDP 


(SEQ 


ID 


NO: 


26) 




D T? TfP P A B. T fl V 
fill JVKUftrt J. O v 


DTICTHRLDP 


(SEQ 


ID 


NO: 


27) 


YSGCRLTLLR 


F Hi JVrio PiJ\ 1 o V 


DATPTT.RLDP 


(SEQ 


ID 


NO: 


28) 


YSGCRL1 LLK 


nTTVPiTT A A TfT^ 


DTTPTHRVDP 


(SEQ 


ID 


NO: 


29) 


YSSCRL1LLK 


ir JCj 1MJ J\rVM. ± K. V 


DATPTHHPDP 


(SEQ 


ID 


NO: 


30) 


YSGCRLTLLR 


T~)T/'T/"rNO 7\ a T 

PKlxULiAA 1 IV V 


nZVTPTYPPDP 

Un x. X 1 i\rur 


(SEQ 


ID 


NO: 


31) 


YSGCRLTLLR 


P KKJJ(jAA 1 Jx V 


riA TPTYPPDP 


(SEQ 


ID 


NO: 


32) 


YSGCRLTLLR 


PE jKJJLjAA 1 oIm 


mVVPT.YHPNP 
Urt. VLJji nirxM xr 


(SEQ 


ID 


NO: 


33) 


YSGCRLTSLR 


P E KJJLj AA 1 uW 


niW/PT.Y'RPNP 


(SEQ 


ID 


NO: 


34) 


YSGCRLTLLR 


nuirnp A A T~D \ 7 


HAVPTHPPDP 

Ur\ V v_ ± niN.lrXj'ir 


(SEQ 


ID 


NO: 


35) 


IQELGPYTLD 


RN5LY VNLir 1 


100 

JtlKDol v lJf ± x o x 










IQELGPYTLD 


RNSLYVNGFT 


HRSSGLTTST 










IIELGPYLLD 


RGSLY VJNLjr 1 


JtlKl O V ir 1 1 o 1 










IIELGPYLLD 


RGSLYVNGFT 


HRNFVPITST 










IIELGPYLLD 


RGSLYVMbr & 


Pr^QCMTTTPT 
Kyoibrlx ± 1 JK. 1 










IIELGPYLLD 


RDSLYVJNvar 1 


TJP C C\?DTT^ T 










VTELGPYTLD 


RDSLYVNGFT 


HRSSVPTTSI 










ITELGPYTLD 


RDSLYVNGFN 


PWSSVPTTST 










ITELGPYTLD 


RDSLYVDGFT 


HWSPIPTTST 










ITELGPYTLD 


RDSLYVNGFT 


QRSSVPTTSI 










ITELGPYTLD 


RDSLYVNGFT 


QRSSVPTTSI 










ITELGPYSLD 


RDSLYVNGFT 


HQNSVPTTST 










ITELGPYSLD 


RDSLYVNGFT 


HQNSVPTTST 










ITELGPYTLD 


RHSLYVNGFT 


HQSSMTTTRT 










TTAGPLLMPF 


TLNFTITNLQ 


150 

YEEDMRRTGS 










TTAGPLLIPF 


TLNFTITNLQ 


YEENMGHPGS 











7 9 ERVLQGLLKP LFRNSSLEYL 
10 811 ERVLQGLLKP LFRNSSLEYL 

21 ERVLQGLLKP LFKSTSVGPL 

8 9 ERVLQGLLKP LFKSTSVGPL 
85 ERVLQGLLKP LFKSTSVGPL 

712 ERVLQGLLKP LFKSTSVGPL 

15 8 6 ERVLQGLLKP LFKSTSVGPL 

87 ERVLQGLLTP LFKNTSVGPL 

810 ERVLQGLLRP LFKNTSIGPL 

83 ERVLQGLLRP VFKNTSVGPL 

81 ERVLQGLLGP MFKNTSVGLL 

20 44 ERVLQGLLKP LFKSTSVGPL 

812 ERVLQGLLSP ISKNSSVGPL 

7 6 ERVLQGLLSP IFKNSSVGSL 

D 51 

253 7 9 EDLGLDRERL YWELSNLTNG 

g 811 EDLGLDRERL YWELSNLTNG 

m 21 LNPGLDREQL YWELSKLTRG 

\f\ 8 9 LNPGLDREQL YWELSKLTRG 

Z\ 85 LNPGLDREQL YWELSKLTRG 

39"i 712 LNPGLDREQL YWELSKLTRG 

^ 8 6 TGPGLDRERL YWELSQLTNS 

^ 87 IGPGLDRERL YWELSQLTNS 

fi 810 QSPGLNREQL YWELSQLTHG 

□ 83 KSPGLDREQL YWELSQLTHS 

350 81 KSPGLDREQL YWELSQLTHS 

f|j 44 KRPGLDREQL YCELSQLTHD 

%\ 812 KRPGLDREQL YWELSQLTHN 

H 7 6 KSPGLDRERL YWKLSQLTHG 

U 

40 101 

7 9 PGTSTVDVGT SGTPSSSPSP 

811 PWTSTVDLGT SGTPSPVPSP 
21 PGTSTVDLGT SGTPFSLPSP ATAGPLLVLF TLNFTITNLK YEEDMHRPGS 

8 9 PGTSTVHLGT SETPSSLPRP IVPGPLLIPF TINFTITNLR YEENMHHPGS 
45 85 PDTSTMHLAT SRTPASLSGP TTASPLLIPF TLNFTITNLQ YEENMGHPGS 

712 PGTSAVHLET FGTPASLHGH TAPGPVLVPF TLNFTITNLQ YEEDMRHPGS 

86 PGTSAVHLET SGTPASLPGH TAPGPLLVPF TLNFTITNLQ YEEDMRHPGS 

87 PGTSTVHLAT SGTPSSLPGH TAPVPLLIPF TLNFTITNLH YEENMQHPGS 
810 PGTSIVNLGT SGIPPSLPET TATGPLLIPF TPNFTITNLQ YEEDMRRTGS 

50 8 3 PGTPTVDLGT SGTPVSKPGP SAASPLLVPF TLNFTITNLQ YEEDMHRPGS 

81 PGTPTVDLGT SGTPVSKPGP SAASPLLIPF TINFTITNLR YEENMGHPGS 

44 PGTSTVYWAT TGTPSSFPGH TEPGPLLIPF TFNFTITNLH YEENMQHPGS 

812 PGTSTVYWAT TGTPSSFPGH TEPGPLLIPF TVNFTITNLR YEENMHHPGS 
76 PDTSTMHLAT SRTPASLSGP TTASPLLVLF TINFTITNQR YEENMHHPGS 

55 
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TABLE 4 -continued 



Amino Acid Sequence for a 800 bp Repeat in the CA125 Molecule 
5 (SEQ ID NO: 22 thru SEQ ID NO: 35) 



151 200 

7 9 RKFNTMERVL QGLLSPIFKN SSVGPLYSGC RLTSLRPEKD GAATGMDAVC 
10 811 RKFNIMERVL QGLLMPLFKN TSVSSLYSGC RLTLLRPEKD GAATRVDAVC 

21 RKFNTTERVL QTLLGPMFKN TSVGLLYSGC RLTLLRSEKD GAATGVDAIC 

8 9 RKFNIMERVL QGLLGPLFKN SSVGPLYSGC RLISLRSEKD GAATGVDAIC 

85 RKFNIMERVL QGLLNPIFKN SSVGPLYSGC RLTSLKPEKD GAATGMDAVC 
15 712 RKFNTTERVL QGLLKPLFKS TSVGPLYSGC RLTLLRPEKR GAATGVDTIC 

86 RKFNTTERVL QGLLKPLFKS TSVGPLYSGC RLTLLRPEKR GAATGVDTIC 

87 RKFNTTERVL QGLLKPLFKS TSVGPLYSGC RLTLLRPEKH GAATGVDAIC 

810 RKFNTMERVL QGLLSPIFKN SSVGPLYSGC RLTSLRPEKD GAATGMDAVC 
83 RKFNATERVL QGLLSPIFKN SSVGPLYSGC RLTSLRPEKD GAATGMDAVC 

20 81 RKFNIMERVL QGLLKPLFKN TSVGPLYSGC RLTLLRPKKD GAATGVDAIC 

44 RKFNTTERVL QGLLKPLFKN TSVGPLYSGC RLTLLRPEKH EAATGVDTIC 

812 RKFNTTERVL QGLLRPVFKN TSVGPLYSGC RLTLLRPKKD GAATKVDAIC 

7 6 RKFNTTERVL QGLLRPVFKN TSVGPLYSGC RLTLLRPKKD GAATKVDAIC 

255 201 250 

%y 7 9 LYHPNPKRPG LDREQLYWEL SQLTHNITEL GPYSLDRDSL YVNGFTHQNS 

EH 811 TQRPDPKSPG LDRERLYWKL SQLTHGITEL GPYTLDRHSL YVNGLTHQSS 

If] 21 THRLDPKSPG VDREQLYWEL SQLTNGIKEL GPYTLDRNSL YVNGFTHWIP 

U 89 THHLNPQSPG LDREQLYWQL SQMTNGIKEL GPYTLDRNSL YVNGFTHRSS 

3Q"i 85 LYHPNPKRPG LDREQLYWEL SQLTHGIKEL GPYTLDRNSL YVNGFTHRSS 

% 712 THRLDPLNPG LDREQLYWEL SKLTRGIIEL GPYLLDRGSL YVNGFTHRNF 

86 THRLDPLNPG LDREQLYWEL SKLTRGIIEL GPYLLDRGSL YVNGFTHRNF 

87 THRLDPKSPG VDREQLYWEL SQLTNGIKEL GPYTLDRNSL YVNGFTHWIP 
^ 810 LYHPNPKRPG LDREQLY 

35=3 83 LYHPNPKRPG LDREQLYWEL SQLTHNITEL GPYSLDRDSL YVNGFTHQSS 

fU 81 THRLDPKSPG LNREQLYWEL SKLTNDIEEL GPYTLDRNSL YVNGFTHQSS 

Sj 44 THRVDPIGPG LDRERLYWEL SQLTNSIHEL GPYTLDRDSL YVNGFNPRSS 

H 812 TYRPDPKSPG LDREQLYWEL SKLTNDIEEL GPYTLDRNSL YVNGFTHQSS 

*" 76 TYRPDPKSPG LDREQLYWEL SQLTHSITEL GPYTQDRDSL YVNGFTHRSS 

4<T 

251 2 88 

7 9 VPTTSTPGTS TVYWATTGTP SSFPGHT . . E PGPL 

811 MTTTRTPDTS TMHLATSRTP ASLSGPT . . T ASPLLIPF 
21 

45 8 9 GLTTSTPWTS TVDLGTSGTP SPVPSPT . . T AGPLLIPF 

85 VAPTSTPGTS TVDLGTSGTP SSLPSPT . . T AVPLLIPF 
712 VPITSTPGTS TVHLGTSETP SSLPRPI . . V PGPLLIPF 

86 VPITSTPGTS TVHLGTSETP SSLPRPI.. V PGPLLIPF 

87 VPTSSTPGTS TVDLG . SGTP SSLPSPT.. T AGPL 

50 810 

83 MTTTRTPDTS TMHLATSRTP ASLSGPT .. T ASPLLIPF 

81 VSTTSTPGTS TVDLRTSGTP SSLSSPTIMA AGPLLIPF 

44 VPTTSTPGTS TVHLATSGTP SSLPGHT..A PVPLLI-- 

812 VSTTSTPGTS TVDLRTSGTP SSLSSPTIMA AGPLLIPF 
55 76 VPTTSIPGTS AVHLETSGTP ASLP 
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TABLE 5 



AHdno Acid Seance for a 1200 bp Repeat in the CA125 Molecule 
(SEQ ID NO: 36 thru SEQ ID NO: 46) 



91 „ ».™ -™ ™™ is s s: "! 



910 ERVLQGLLUF wrhi^ io »w«-^-. I" nTICTHRVDP (SEQ ID NO: 37) 

,. ,, ; = s KXS E2— — ; s: •» 

» ■» =s: ss= :=~ ^ S Siiii 

si ====== = := = 



20 ioo 

51 

910 
99 

W 112 
25^ 95 



T NPCLDREOL YWELSKLTRG IIELGPYLLD RGSLYVNGFT HRNFVPITST 
IGPGLDRERL YWELSQLTNS ITELGPYTLD RDSLYVNGFN PWSSVPTTST 

^pSdreql ywelsqlthn itelgpysld rdslyvngft HQNSVPTTST 

VJ 95 SpgSrEQL YWELSQLTHN ITELGPYSLD RDSLYVNGFT HQNSVPTTST 

5 ni SpgvSreql YWELSQLTNG IKELGPYTLD RNSLYVNGFT hqtsapntst 

i 78 SpgSl YWELSKLTND IEELGPYTLD RNSLYVNGFT HQSSVSTTST 

S 115 KIPGLDRQQL YWELSQLTHS ITELGPYTLD RDSLYVNGFT QRSSVPTTST 

I 91 1 YWELSNLTNG IQELGPYTLD RNSLYVNGFT HRSSMPTTST 

V- 92 LNPGLDREQL YWELSKLTRG IIELGPYLLD RGSLYVNGFT HRNFVPITST 

fcU 113 KSPgSrEQL YWELSQLTHG IKELGPYTLD RNSLYVNGFT HRSSVAPTST 

» 111 TGPGLDrIS YWELSQLTNS VTELGPYTLD RDSLYVNGFT HRSSVPTTSI 

150 

^3 qiO PGTSTVHLGT SETPSSLPRP IV..PGPLLV PFTLNFTITN LQYEEAMRHP 

3 m 91 9 ° 9 pGTSTVHLAT SGTPSSLPGH TA..PVPLLI PFTLNFTITN LHYEENMQHP 

M T12 PGTSTVYWAT TGTPSSFPGH T . . EPGPLLI PFTLNFTITN LQYEENMGHP 

£ 95 PGTSTVYWAT TGTPSSFPGH T.. EPGPLLI PFTLNFTITN LQYEENMGHP 

71 TOLGT SGTPSSLPSP T . . SAGPLLI PFTINFTITN LRYEENMHHP 

4(f 78 PgS^LrJ SOTPSSLSSP TIMAAGPLLI PFTINFTITN LRYEENMHHP 

115 Itll^ll SETPSSLPGP T . . ATGPVLL PFTLNFTIIN LQYEEDMHRP 

91 PGTSTVDVGT SGTPSSSPSP T . . TAGPLLM PFTLNFTITN LQYEEDMRRT 

92 PgS^LOT SETPSSLPRP IV. . PGPLLI PFTLNFTITN LQYEENMGHP 
113 POTSTVDLGT SGTPSSLPSP T..TAVPLLI PFTLNFTITN LKYEEDMHCP 

45 711 PGtSvSlS SGTPASLPGH T. .APGPLLI PFTLNFTITN LHYEENMQHP 

200 

910 gsrkfntter VLQGLLRPLF kntsvsslys gcrltllrpe kdgaatrvda 
11 GSRKFNTTER vlqgllkplf kntsvgplys gcrltlfkpe KHEAATGVDA 
50 ii gIrSnitS vlqglltplf knssvgplys gcrlislrse kdgaatgvda 

95 GSRKFNITER VLQGLLNPIF KNSSVGPLYS GCRLTSLRPE KDGAATGMDA 

ni SrKFNTMER VLQGLLKPLF KSTSVGPLYS GCRLTLLRPE KDGVATRVDA 

78 gKkfSeR VLQGLLMPLF KNTSVSSLYS GCRLTLLRPE KDGAATRVDA 

H5 gKkfSeR VLQGLLMPLF KNTSVGPLYS GCRLTLLRPE KQEAATGVDT 

55 91 gRkFNTMES VLQGLLKPLF KNTSVGPLYS GCRLTLLRPK KDGAATGVDA 

92 GSRKFNItS VLQGLLKPLF RNSSLEYLYS GCRLTSLRPE KDSSTMAVDA 
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TABLE 5 -continued 



(SEQ ID NO: 36 thru SEQ ID NO. 46) 



£ SEE S3SSSS = see = 

10 250 



910 
99 
112 



ACTYRPDPKS PGLDREQLYW ELSQLTHSIT ELGPYTLDRV SLYVNGFNPR 

SlrlSpS pgldrerlyw elsqltnsvt elgpytldrd slyvngfthr 

iShHLNPQS PGLDREQLYW QLSQMTNGIK ELGPYTLDRD ^YVNGFTHR 
15 95 VCLYHPNPKR PGLDREQLYC ELSQLTHNXX — - = 

. ! mmmmm 

Al XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXGPYTLDRN SLYVNGFTHR 



300 



4 •:; - ;=s s= ~- 

I IS i= S= =i =| 
"i .2 l=EEE5i==| 



91 
92 



,| £ «=m tv.^ppt 



350 



q 9X0 EEDMRHPGSR KFNTMERVLQ GLLRPLFKNT SIGPLYSSCR ^LLRPE^K 
U 99 SZrTGSR KFNTMERVLQ GLLKPLFKST SVGPLYSGCR LTLLRPEKRG 
40 112 EENMGHPGSR KFNIMERVLQ Gj^Vjm S^L^CR 

S EEDMHRPGSR SEES SSS3S SVGPLYSGCR LTL_ 

78 IeDMRRTGSR KFNTMERVLQ GLLKPLFKST SVGPLYSGCR ^LLRPEKHG 

H EEDMHRPGSR KFNTTERVLQ GLLMPLFKNT SVGPLYSGCR ^LLRPEKQE 

4S 91 EENMGHPGSR KFNIMERVLQ GLLMPLFKNT SVSSLYSGCR J-TLLRPETOG 

92 EEDMRRTGSR KFNTMESVLQ GLLKPLFKNT SVGPLYSGCR ^LLRPKKDG 

, EEDMRRTGSR KFNTMESVLQ GLLKPLFKNT SVGPLYSGCR LTLLRPEKDG 

"l SZhPGSR KFNTTERVLQ GLLGPLFKNS SVGPLYSGCR LISLRSEKDG 

400 

50 ARTRVDAICT HHPDPQSPGL NREQLYWELS QLTHGITEL 

-t^StCT HRLDPLNPGL DREQLYWELS KLTRGIIELG PYLLDRGSLY 
tiSSSSS YRpSpKSPGL DREQLYWELS QLTHSITELG PYTLDRDSLY 
SSSS IrLDpSpGL DRERLYWELS QLTNSVTELG PYTLDRDSLY 
ttSvSAlS lSdPTGPGL DRERLYWELS QLTNSITELG PYTLDRDSLY 
L^LDPTGPGL DRERLYWELS QLTNSVTELG PYTLDRDSLY 



910 
99 

112 
95 

55 7i 

78 
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TABLE 5 -continued 



^ino Acid Seance for a X200 bp Repeat in the CA12 5 Molecule 
(SEQ ID NO: 36 thru SEQ ID NO: 46) 



.., RATGVDTICT HRVDPIGPGL DRERLYWELS QLTNSITELG PYTLDRDSLY 

91 AATRVVAVCT HRPDPKSPGL DRERLYWKLS QLTHGITELG PYTLDRHSLY 

92 A^AICT HRLDPKSPGL NREQLYWELS KLTNDIEELG PYTLDRNSLY 

in AATnVDATCT HRLDPKSPGL NREQLYWELS KL ™ Tf , T v 

"l aISSS HHLNPQSPGL DREQLYWQLS QVTNGIKELG PYTLDRNSLY 

447 

401 



910 



99 VNGFTHRNFV PITSTPGTST VHLGTSEIHP SLPRPI • - VP GPL—- 

112 VNGFTQRSSV PTTSIPGTPT VDLGTSGTPV SKPGPS . . AA SP- — 

95 VNGFTHRSSV PTTSIPGTSA VHLETSGTPA SLPGHT. .AP GPLL— 

71 VNGFNPWSSV PTTSTPGTST VHLATSGTPS SLPGHT. .AP VPL---- 

n* VNGFTHRSSV PTTSIPGTSA VHLETSGTPA SLPGHT. .AP GPLLIPF 

1 7 1 WGFNPWSSV PTTSTPGTST VHLATSGTPS SLPGHT . .AP VPLL PF 

qi VNrrFTHOSSM TTTRTPDTST MHLATSRTPA SLSGPT..TA SPLLIPF 

92 WGF™QSSV STTSTPGTST VDPRTSGTPS SLSSPTIMAA GPLLI-- 

7" WGFTHRSSG LTTSTPWTST VDLGTSGTPS PVPSPT..TA GPLLI-- 
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TABLE 6 



- 7 . 9 R e C ea7structure in the CA125 Molecule 

^ino Acid Sequence for a 4?) 



ERVLQGLLKP LFRNSSLEYL «*CBW ™^ HRSSMPTTST 
EDLGLDRERL YWELSNLTNG ^LGPYTLD ^LYVN YEEDMRRTGS 
10 PGTSTVDVGT SGTPSSSPSP ™PLLMPF GAATGM DAV 

RKFNTMERVL QGPLSPIFKN SSVGPLYSGC « LY VNGFTHQN 
CLYHPNPKRP GLDREQLYWE ^LTHNITE "™ FTITNLQYEE 

svpttstpgt stvywattgt pssfpghtep ^ SLRPE kdgaa 

NMGHPGSRKF NITERVLQGL ^PIF^SSV ^SGC 
15 TGMDAVCLYH PNPKRPGLDR ^YCELSQL THNI IPFTLNFTIT 
GFTHQNSVPT TSTPGTSTVY "ATTGTPSSF ™ gGCRLTLLRP 

nlqyeedmrr tgsrkfntme RVlqgllkpl fkst ytldr 
ekhgaatgvd aictlrldpt gpgldrerly welsqltns LVppT 

DSLYVNGFTH JSSVPTTSIP JTSAVHLETS GTP^^ SVQpLYSGCR 
20 LNFTITNLQY EEDMRHPGSR KFNTlhKV u KLTRGIIELG 
LTLLRPEKRG AATGVDTICT HRLDPLNPGL ^QLYW SLPR p IV p GP 
PYLLDRGSLY VNGFTHRNFV ^TPGTST ™^ PLFRNSSLEY 
„ LLIPFTLNFT ITNLQYEENM ^SRKFNI ™ LYWELSNLTN 
^ LYSGCRLASL RPEKDSSAMA ™^THRPD ^ TSG TPSSSPS 
2$3 GIQELGPYTL DRNSLYVNGF THRSSMPTTS "JTSTV LQGLLKPLFK 
% PTTAGPLLMP FTLNFTITNL QYEEDM^TG SRKFNTMES 

tf» NTSVGPLYSG CRLTLLRPKK DGAATGVDAI CTHRLDPK ^ 
ill LSKLTNDIEE J£55S SShPGSRKF NIMERVLQGL 

N PSSLPSPTTG VPLLIPFTLN F ™U£ TPVDAVCTHR PDPKSPGLDR 
m .SPIFKHSSV OSLYSGCRLT ^RPEKDGAA ^CT^ 

? SSSSS SGPTTASPLL -J™— ™ ™ 

° SSSSS SEE ESSE rs^p 



GTSAVHLETS GTPASLP 

fU 



□ 



42 



10 
15 
20 

•si 

2S0 

■.n 

tn 

: rS 
U ; 



TABLE 7 

& 2) 



4«n * AK024365 Encompasses Repeat Sequences (Repeats 1 
cDNA Genban* ^- S -- 1 ^° s 24 to 6 Lo Repeats Shown in Table 6 

(SEQ ID NO: 48) 



MPL ™ T SVS SKYSGCRLTI, LR~T K— KP DP~E 

EKE "= sees S- —s 

VLQGLLRPVF KNTSVGPLYS GCRLTLLRPK KD^KVDA 

3SSS S353K EEg EE S53S 
SEES SSSES EES £™ SSSS 

VLVTVKALFS SNLDPSLVEQ ™KTLNAS ^LGSTYQL VDI 



35" 
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TABLE 8 



Complete DNA Sequence for 13 Repeats including the Carboxy Terminus of 
5 (SEQ ID NO: 49) 





l 


GAGAGGGTTC 


TGCAGGGTCT 


GCTCAAACCC 


TTGTTCAGGA 


ATAGCAGTCT 


10 


51 


GGAATACCTC 


TATTCAGGCT 


GCAGACTAGC 


CTCACTCAGG 


CCAGAGAAGG 




101 


ATAGCTCAGC 


CATGGCAGTG 


GATGCCATCT 


GCACACATCG 


CCCTGACCCT 




151 


GAAGACCTCG 


GACTGGACAG 


AGAGCGACTG 


TACTGGGAGC 


TGAGCAATCT 


201 


GACAAATGGC 


ATCCAGGAGC 


TGGGCCCCTA 


CACCCTGGAC 


CGGAACAGTC 




251 


TCTATGTCAA 


TGGTTTCACC 


CATCGAAGCT 


CTATGCCCAC 


CACCAGCACT 


20 


301 


CCTGGGACCT 


CCACAGTGGA 


TGTGGGAACC 


TCAGGGACTC 


CATCCTCCAG 




351 


CCCCAGCCCC 


ACGACTGCTG 


GCCCTCTCCT 


GATGCCGTTC 


ACCCTCAACT 


□ 
2$ 


401 


TCACCATCAC 


CAACCTGCAG 


TACGAGGAGG 


ACATGCGTCG 


CACTGGCTCC 


ill 

'S its 

Cn 


451 


AGGAAGTTCA 


ACACCATGGA 


GAGGGTTCTG 


CAGGGTCCGC 


TTAGTCCCAT 




501 


ATTCAAGAAC 


TCCAGTGTTG 


GCCCTCTGTA 


CTCTGGCTGC 


AGACTGACCT 




551 


CTCTCAGGCC 


CGAGAAGGAT 


GGGGCAGCAA 


CTGGAATGGA 


TGCTGTCTGC 


; s j 


601 


CTCTACCACC 


CTAATCCCAA 


AAGACCTGGG 


CTGGACAGAG 


AGCAGCTGTA 


n 


651 


CTGGGAGCTA 


AGCCAGCTGA 


CCCACAACAT 


CACTGAGCTG 


GGCCCCTACA 


iy 


701 


GCCTGGACAG 


GGACAGTCTC 


TATGTCAATG 


GTTTCACCCA 


TCAGAACTCT 




751 


GTGCCCACCA 


CCAGTACTCC 


TGGGACCTCC 


ACAGTGTACT 


GGGCAACCAC 


40 


801 


TGGGACTCCA 


TCCTCCTTCC 


CCGGCCACAC 


AGAGCCTGGC 


CCTCTCCTGA 




851 


TACCATTCAC 


GCTCAACTTC 


ACCATCACTA 


ACCTACAGTA 


TGAGGAGAAC 




901 


ATGGGTCACC 


CTGGCTCCAG 


GAAGTTCAAC 


ATCACGGAGA 


GGGTTCTGCA 


45 


951 


GGGTCTGCTT 


AATCCCATTT 


TCAAGAACTC 


CAGTGTTGGC 


CCTCTGTACT 




1001 


CTGGCTGCAG 


ACTGACCTCT 


CTCAGGCCCG 


AGAAGGATGG 


GGCAGCAACT 


50 


1051 


GGAATGGATG 


CTGTCTGCCT 


CTACCACCCT 


AATCCCAAAA 


GACCTGGGCT 




1101 


GGACAGAGAG 


CAGCTGTACT 


GCGAGCTAAG 


CCAGCTGACC 


CACAACATCA 




1151 


CTGAGCTGGG 


CCCCTACAGC 


TTGGACAGGG 


ACAGTCTTTA 


TGTCAATGGT 



44 





TABLE 8 -continued 



Complete DNA Sequence for 13 Repeats including the Carboxy Terminus of CA12 5 

(SEQ ID NO: 49) 





1201 


TTCACCCATC 


AGAACTCTGT 


GCCCACCACC 


AGTACTCCTG 


GGACCTCCAC 


10 


1251 


AGTGTACTGG 


GCAACCACTG 


GGACTCCATC 


CTCCTTCCCC 


GGCCACACAG 




1301 


AGCCTGGCCC 


TCTCCTGATA 


CCATTCACCC 


TCAACTTCAC 


CATCACCAAC 


1 c 

1 J 


1351 


CTGCAGTACG 


AGGAGGACAT 


GCGTCGCACT 


GGCTCCAGGA 


AGTTCAACAC 


1401 


CATGGAGAGG 


GTTCTGCAGG 


GTCTGCTCAA 


GCCCTTGTTC 


AAGAGCACCA 




1451 


GCGTTGGCCC 


TCTGTACTCT 


GGCTGCAGAC 


TGACCTTGCT 


CAGACCTGAG 


20 


1501 


AAACATGGGG 


CAGCCACTGG 


AGTGGACGCC 


ATCTGCACCC 


TCCGCCTTGA 




1551 


TCCCACTGGT 


CCTGGACTGG 


ACAGAGAGCG 


GCTATACTGG 


GAGCTGAGCC 




1601 


AGCTGACCAA 


CAGCGTTACA 


GAGCTGGGCC 


CCTACACCCT 


GGACAGGGAC 














CAACCACCAG 




1651 


AGTCTCTATG 


TCAATGGCTT 


CACCCATCGG 


AGCTCTGTGC 


t if* 


1701 


TATTCCTGGG 


ACCTCTGCAG 


TGCACCTGGA 


AACCTCTGGG 


ACTCCAGCCT 




1751 


CCCTCCCTGG 


CCACACAGCC 


CCTGGCCCTC 


TCCTGGTGCC 


ATTCACCCTC 




1801 


AACTTCACTA 


TCACCAACCT 


GCAGTATGAG 


GAGGACATGC 


GTCACCCTGG 




1851 


TTCCAGGAAG 


TTCAACACCA 


CGGAGAGAGT 


CCTGCAGGGT 


CTGCTCAAGC 


%j 


1901 


CCTTGTTCAA 


GAGCACCAGT 


GTTGGCCCTC 


TGTACTCTGG 


CTGCAGACTG 


f :: 1 


1951 


ACCTTGCTCA 


GGCCTGAAAA 


ACGTGGGGCA 


GCCACCGGCG 


TGGACACCAT 


40 


2001 


CTGCACTCAC 


CGCCTTGACC 


CTCTAAACCC 


TGGACTGGAC 


AG AG AG C AG C 




2051 


TATACTGGGA 


GCTGAGCAAA 


CTGACCCGTG 


GCATCATCGA 


GCTGGGCCCC 


45 


2101 


TACCTCCTGG 


ACAGAGGCAG 


TCTCTATGTC 


AATGGTTTCA 


CCCATCGGAA 


2151 


CTTTGTGCCC 


ATCACCAGCA 


CTCCTGGGAC 


CTCCACAGTA 


CACCTAGGAA 




2201 


CCTCTGAAAC 


TCCATCCTCC 


CTACCTAGAC 


CCATAGTGCC 


TGGCCCTCTC 


50 


2251 


CTGATACCAT 


TCACACTCAA 


CTTCACCATC 


ACTAACCTAC 


AGTATGAGGA 




2301 


GAACATGGGT 


CACCCTGGCT 


CCAGGAAGTT 


CAACATCACG 


GAGAGGGTTC 




2351 


TGCAGGGTCT 


GCTCAAACCC 


TTGTTCAGGA ATAGCAGTCT 


GGAATACCTC 
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TABLE 8 -continued 



Complete DNA Sequence for 13 Repeats including the Carboxy Terminus of CA12 5 
^ * (SEQ ID NO: 49) 

2401 TATTCAGGCT GCAGACTAAC CTCACTCAGG CCAGAGAAGG ATAGCTCAAC 
2451 CATGGCAGTG GATGCCATCT GCACACATCG CCCTGACCCT GAAGACCTCG 
2501 GACTGGACAG AGAGCGACTG TACTGGGAGC TGAGCAATCT GACAAATGGC 
2 551 ATCCAGGAGC TGGGCCCCTA CACCCTGGAC CGGAACAGTC TCTATGTCAA 
2601 TGGTTTCACC CATCGAAGCT CTATGCCCAC CACCAGCACT CCTGGGACCT 
2651 CCACAGTGGA TGTGGGAACC TCAGGGACTC CATCCTCCAG CCCCAGCCCC 
27 01 ACGACTGCTG GCCCTCTCCT GATGCCGTTC ACCCTCAACT TCACCATCAC 
2751 CAACCTGCAG TACGAGGAGG ACATGCGTCG CACTGGCTCC AGGAAGTTCA 
2801 ACACCATGGA GAGTGTCCTG CAGGGTCTGC TCAAGCCCTT GTTCAAGAAC 

2 851 ACCAGTGTTG GCCCTCTGTA CTCTGGCTGC AGATTGACCT TGCTCAGGCC 
2901 CAAGAAAGAT GGGGCAGCCA CTGGAGTGGA TGCCATCTGC ACCCACCGCC 
2951 TTGACCCCAA AAGCCCTGGA CTCAACAGGG AGCAGCTGTA CTGGGAGTTA 

3 001 AGCAAACTGA CCAATGACAT TGAAGAGGTG GGCCCCTACA CCTTGGACAG 
3051 GAACAGTCTC TATGTCAATG GTTTCACCCA TCGGAGCTTT GTGGCCCCCA 
3101 CCAGCACTCT TGGGACCTCC ACAGTGGACC TTGGGACCTC AGGGACTCCA 
3151 TCCTCCCTCC CCAGCCCCAC AACAGGTGTT CCTCTCCTGA TACCATTCAC 
3201 ACTCAACTTC ACCATCACTA ACCTACAGTA TGAGGAGAAC ATGGGTCACC 
3251 CTGGCTCCAG GAAGTTCAAC ATCATGGAGA GGGTTCTGCA GGGTCTGCTT 
3 3 01 ATGCCCTTGT TCAAGAACAC CAGTGTCAGC TCTCTGTACT CTGGTTGCAG 
3 351 ACTGACCTTG CTCAGGCCTG AGAAGGATGG GGCAGCCACC AGAGTGGTTG 
34 01 CTGTCTGCAC CCATCGTCCT GACCCCAAAA GCCCTGGACT GGACAGAGAG 

34 51 CGGCTGTACT GGAAGCTGAG CCAGCTGACC CACGGCATCA CTGAGCTGGG 

35 01 CCCCTACACC CTGGACAGGC ACAGTCTCTA TGTCAATGGT TTCACCCATC 
3551 AGAGCTCTAT GACGACCACC AGAACTCCTG ATACCTCCAC AATGCACCTG 
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TABLE 8 -continued 



13 Reoeats including the Carboxy Terminus of CA125 
Complete DNA Sequence for 13 WJ^in^ ^ 



3601 GCAACCTCGA GAACTCCAGC CTCCCTGTCT GGACCTACGA CCGCCAGCCC 
3651 TCTCCTGATA CCATTCACAA TTAACTTCAC CATCACTAAC CTGCGGTATG 
3701 AGGAGAACAT GCATCACCCT GGCTCTAGAA AGTTTAACAC CACGGAGAGA 
3751 GTCCTTCAGG GTCTGCTCAG GCCTGTGTTC AAGAACACCA GTGTTGGCCC 
3801 TCTGTACTCT GGCTGCAGAC TGACCTTGCT CAGGCCCAAG AAGGATGGGG 
3851 CAGCCACCAA AGTGGATGCC ATCTGCACCT ACCGCCCTGA TCCCAAAAGC 
3901 CCTGGACTGG ACAGAGAGCA GCTATACTGG GAGCTGAGCC AGCTAACCCA 
3951 CAGCATCACT GAGCTGGGCC CCTACACCCT GGACAGGGAC AGTCTCTATG 
4001 TCAATGGTTT CACACAGCGG AGCTCTGTGC CCACCACTAG CATTCCTGGG 
4051 ACCCCCACAG TGGACCTGGG AACATCTGGG ACTCCAGTTT CTAAACCTGG 
4101 TCCCTCGGCT GCCAGCCCTC TCCTGGTGCT ATTCACTCTC AACTTCACCA 
4151 TCACCAACCT GCGGTATGAG GAGAACATGC AGCACCCTGG CTCCAGGAAG 
4201 TTCAACACCA CGGAGAGGGT CCTTCAGGGC CTGCTCAGGT CCCTGTTCAA 
4251 GAGCACCAGT GTTGGCCCTC TGTACTCTGG CTGCAGACTG ACTTTGCTCA 
^ 4301 GGCCTGAAAA GGATGGGACA GCCACTGGAG TGGATGCCAT CTGCACCCAC 

4351 CACCCTGACC CCAAAAGCCC TAGGCTGGAC AGAGAGCAGC TGTATTGGGA 
4401 GCTGAGCCAG CTGACCCACA ATATCACTGA GCTGGGCCAC TATGCCCTGG 
4451 ACAACGACAG CCTCTTTGTC AATGGTTTCA CTCATCGGAG CTCTGTGTCC 
4501 ACCACCAGCA CTCCTGGGAC CCCCACAGTG TATCTGGGAG CATCTAAGAC 
4551 TCCAGCCTCG ATATTTGGCC CTTCAGCTGC CAGCCATCTC CTGATACTAT 
4601 TCACCCTCAA CTTCACCATC ACTAACCTGC GGTATGAGGA GAACATGTGG 
4651 CCTGGCTCCA GGAAGTTCAA CACTACAGAG AGGGTCCTTC AGGGCCTGCT 
4701 AAGGCCCTTG TTCAAGAACA CCAGTGTTGG CCCTCTGTAC TCTGGCTCCA 
4751 GGCTGACCTT GCTCAGGCCA GAGAAAGATG GGGAAGCCAC CGGAGTGGAT 
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TABLK 8 -continued 



#or 13 Reoeats including the Carboxy Terminus of CA125 
Complete DNA Sequence for 13 WJt.^ ^ 



48 01 GCCATCTGCA CCCACCGCCC TGACCCCACA GGCCCTGGGC TGGACAGAGA 
[0 4851 GCAGCTGTAT TTGGAGCTGA GCCAGCTGAC CCACAGCATC ACTGAGCTGG 

490 1 GCCCCTACAC ACTGGACAGG GACAGTCTCT ATGTCAATGG TTTCACCCAT 
495 1 CGGAGCTCTG TACCCACCAC CAGCACCGGG GTGGTCAGCG AGGAGCCATT 
15 5001 CACACTGAAC TTCACCATCA ACAACCTGCG CTACATGGCG GACATGGGCC 

5051 AACCCGGCTC CCTCAAGTTC AACATCACAG ACAACGTCAT GAAGCACCTG 
20 5101 CTCAGTCCTT TGTTCCAGAG GAGCAGCCTG GGTGCACGGT ACACAGGCTG 

5151 CAGGGTCATC GCACTAAGGT CTGTGAAGAA CGGTGCTGAG ACACGGGTGG 
520 1 ACCTCCTCTG CACCTACCTG CAGCCCCTCA GCGGCCCAGG TCTGCCTATC 
5251 AAGCAGGTGT TCCATGAGCT GAGCCAGCAG ACCCATGGCA TCACCCGGCT 
5301 GGGCCCCTAC TCTCTGGACA AAGACAGCCT CTACCTTAAC GGTTACAATG 
5351 AACCTGGTCT AGATGAGCCT CCTACAACTC CCAAGCCAGC CACCACATTC 
5401 CTGCCTCCTC TGTCAGAAGC CACAACAGCC ATGGGGTACC ACCTGAAGAC 
545 1 CCTCACACTC AACTTCACCA TCTCCAATCT CCAGTATTCA CCAGATATGG 
^ 5501 GCAAGGGCTC AGCTACATTC AACTCCACCG AGGGGGTCCT TCAGCACCTG 

5551 CTCAGACCCT TGTTCCAGAA GAGCAGCATG GGCCCCTTCT ACTTGGGTTG 
5601 CCAACTGATC TCCCTCAGGC CTGAGAAGGA TGGGGCAGCC ACTGGTGTGG 
565 1 ACACCACCTG CACCTACCAC CCTGACCCTG TGGGCCCCGG GCTGGACATA 
5701 CAGCAGCTTT ACTGGGAGCT GAGTCAGCTG ACCCATGGTG TCACCCAACT 
5751 GGGCTTCTAT GTCCTGGACA GGGATAGCCT CTTCATCAAT GGCTATGCAC 
5801 CCCAGAATTT ATCAATCCGG GGCGAGTACC AGATAAATTT CCACATTGTC 
5851 AACTGGAACC TCAGTAATCC AGACCCCACA TCCTCAGAGT ACATCACCCT 
590 1 GCTGAGGGAC ATCCAGGACA AGGTCACCAC ACTCTACAAA GGCAGTCAAC 
595 1 TACATGACAC ATTCCGCTTC TGCCTGGTCA CCAACTTGAC GATGGACTCC 
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TABLE 8 -continued 



* i-» » msstS including the Carboxy Terminus of CA125 
Complete DNA Sequence for 13 ^^^V 



6001 GTGTTGGTCA CTGTCAAGGC ATTGTTCTCC TCCAATTTGG ACCCCAGCCT 
605 1 GGTGGAGCAA GTCTTTCTAG ATAAGACCCT GAATGCCTCA TTCCATTGGC 
6101 TGGGCTCCAC CTACCAGTTG GTGGACATCC ATGTGACAGA AATGGAGTCA 
6151 TCAGTTTATC AACCAACAAG CAGCTCCAGC ACCCAGCACT TCTACCCGAA 
6201 TTTCACCATC ACCAACCTAC CATATTCCCA GGACAAAGCC CAGCCAGGCA 
6251 CCACCAATTA CCAGAGGAAC AAAAGGAATA TTGAGGATGC GCTCAACCAA 
63 01 CTCTTCCGAA ACAGCAGCAT CAAGAGTTAT TTTTCTGACT GTCAAGTTTC 
6351 AACATTCAGG TCTGTCCCCA ACAGGCACCA CACCGGGGTG GACTCCCTGT 
6401 GTAACTTCTC GCCACTGGCT CGGAGAGTAG ACAGAGTTGC CATCTATGAG 
6451 GAATTTCTGC GGATGACCCG GAATGGTACC CAGCTGCAGA ACTTCACCCT 
6501 GGACAGGAGC AGTGTCCTTG TGGATGGGTA TTCTCCCAAC AGAAATGAGC 
6551 CCTTAACTGG GAATTCTGAC CTTCCCTTCT GGGCTGTCAT CTTCATCGGC 
6601 TTGGCAGGAC TCCTGGGACT CATCACATGC CTGATCTGCG GTGTCCTGGT 
6651 GACCACCCGC CGGCGGAAGA AGGAAGGAGA ATACAACGTC CAGCAACAGT 
6701 GCCCAGGCTA CTACCAGTCA CACCTAGACC TGGAGGATCT GCAATGACTG 
6751 GAACTTGCCG GTGCCTGGGG TGCCTTTCCC CCAGCCAGGG TCCAAAGAAG 
6801 CTTGGCTGGG GCAG AAATAA ACCATATTGG TCG 
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ete Amino Acid Sequence for 13 Repeats Contiguous with the Carboxy Terminus 

Of CA125 (SEQ ID NO: 50) 



ERVLQGLLKP LFRNSSLEYL YSG CRLASLR PEKDSSAMAV DAIC THRPDP 



EDLGLDRERL YWELSNLTNG IQELGPYTLD RNSLYVNGFT HRSSMPTTST 

PGTSTVDVGT SGTPSSSPSP TTAGPLLMPF TLNFTITNLQ YEEDMRRTGS 

2 

RKFNTMERVL QGPLSPIFKN SSVGPLYSG C RLTSLRPEKD GAATGMDAVC 

LYHPNPKRPG LDREQLYWEL SQLTHNITEL GPYSLDRDSL YVNGFTHQNS 

VPTTSTPGTS TVYWATTGTP SSFPGHTEPG PLLIPFTLNF TITNLQYEEN 

3 

MGHPGSRKFN ITERVLQGLL NPIFKNSSVG PLYSGCRLTS LRPEKDGAAT 



GMDAVCLYHP NPKRPGLDRE QLYCELSQLT HNITELGPYS LDRDSLYVNG 

FTHQNSVPTT STPGTSTVYW ATTGTPSSFP GHTEPGPLLI PFTLNFTITN 

4 

LQYEEDMRRT GSRKFNTMER VLQGLLKPLF KSTSVGPLYS G CRLTLLRPE 

KHGAATGVDA IC TLRLDPTG PGLDRERLYW ELSQLTNSVT ELGPYTLDRD 

SLYVNGFTHR SSVPTTSIPG TSAVHLETSG TPASLPGHTA PGPLLVPFTL 

NFTITNLQYE EDMRHPGSRK FNTTERVLQG LLKPLFKSTS VGPLYSGCRL 
5 

TLLRPEKRGA ATGVDTIC TH RLDPLNPGLD REQLYWELSK LTRGIIELGP 

YLLDRGSLYV NGFTHRNFVP ITSTPGTSTV HLGTSETPSS LPRPIVPGPL 

LIPFTLNFTI TNLQYEENMG HPGSRKFNIT ERVLQGLLKP LFRNSSLEYL 
6 

YSGCRLASLR PEKDSSAMAV DAICTHRPDP EDLGLDRERL YWELSNLTNG 



IQELGPYTLD RNSLYVNGFT HRSSMPTTST PGTSTVDVGT SGTPSSSPSP 

TTAGPLLMPF TLNFTITNLQ YEEDMRRTGS RKFNTMESVL QGLLKPLFKN 

7 

TSVGPLYSG C RLTLLRPKKD GAATGVDAIC THRLDPKSPG LNREQLYWEL 



SKLTNDIEEV GPYTLDRNSL YVNGFTHRSF VAPTSTLGTS TVDLGTSGTP 

SSLPSPTTGV PLLIPFTLNF TITNLQYEEN MGHPGSRKFN IMERVLQGLL 

8 

SPIFKNSSVG SLYSGCRLTL LRPEKDGAAT RVDAVCTHRP DPKSPGLDRE 



RLYWKLSQLT HGIIELGPYT LDRHSFYVNG FTHQSSMTTT RTPDTSTMHL 
ATSRTPASLS GPTTASPLLV LFTINFTITN QRYEENMHHP GSRKFNTTER 
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TABLE 9 -continued 



Complete Amino Acid Sequence for 13 Repeats Contiguous with the Carboxy Terminus 
5 of CA125 (SEQ ID NO: 50) 

9 

VLQGLLRPVF KNTSVGPLYS G CRLTLLRPK KDGAATKVDA IC TYRPDPKS 
!0 PGLDREQLYW ELSQLTHSIT ELGPYTQDRD SLYVNGFTHR SSVPTTSIPG 

TSAVHLETSG TPASLPGPSA ASPLLVLFTL NFTITNLRYE ENMQHPGSRK 
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20 



t ft 



10 

FNTTERVLQG LLRSLFKSTS VGPLYSG CRL TLLRPEKDGT ATGVDAIC TH 

HPDPKSPRLD REQLYWELSQ LTHNITELGH YALDNDSLFV NGFTHRSSVS 

TTSTPGTPTV YLGASKTPAS IFGPSAASHL LILFTLNFTI TNLRYEENMW 

11 

PGSRKFNTTE RVLQGLLRPL FKNTSVGPLY SG SRLTLLRP EKDGEATGVD 
AICTHRPDPT GPGLDREQLY LELSQLTHSI TELGPYTLDR DSLYVNGFTH 
RSSVPTTSTG WSEEPFTLN FTINNLRYMA DMGQPGSLKF NITDNVMKHL 
^ LSPLFQRSSL GARYTGCRVI ALRSVKNGAE TRVDLLCTYL QPLSGPGLPI 

in 

H KQVFHELSQQ THGITRLGPY SLDKDSLYLN GYNEPGLDEP PTTPKPATTF 

LU 

31 LPPLSEATTA MGYHLKTLTL NFTISNLQYS PDMGKGSATF NSTEGVLQHL 

13 

LRPLFQKSSM GPFYLG CQLI SLRPEKDGAA TGVDTTC TYH PDPVGPGLDI 
QQLYWELSQL THGVTQLGFY VLDRDSLFIN GYAPQNLSIR GEYQINFHIV 
NWNLSNPDPT SSEYITLLRD IQDKVTTLYK GSQLHDTFRF CLVTNLTMDS 
f>* VLVTVKALFS SNLDPSLVEQ VFLDKTLNAS FHWLGSTYQL VDIHVTEMES 

40 SVYQPTSSSS TQHFYLNFTI TNLPYSQDKA QPGTTNYQRN KRNIEDALNQ 

LFRNSSIKSY FSDCQVSTFR SVPNRHHTGV DSLCNFSPLA RRVDRVAIYE 
EFLRMTRNGT QLQNFTLDRS SVLVDGYSPN RNEPLTGNSD LPFWAVILIG 
LAGLLGLITC LICGVLVTTR RRKKEGEYNV QQQCPGYYQS HLDLEDLQ 



ry 



45 



50 



51 




10 



TABLE 10A 



5' Primer Sequence for End of the Open Reading Frame for Contig #32 

19 Cosmid AC008734 (SEQ ID NO: 51), Primer Sequence from wxthin the Repeat R^on 

(SEQ ID NO: 61), Anti-Sense Primer Sequence for CA125 (SEQ ID NO. 62), ana 
5'Sense Primer Sequence (from Ambion) (SEQ ID NO: 63) and Antx-Sense Prxmer 

Specific to CA125 (SEQ ID NO: 64) 



(SEQ ID NO: 51) (5 ' -C AGC AGAGACC AGC ACGAGTACTC-3 ' ) 

15 (SEQ ID NO: 52) (5 ' -TCC ACTGCC ATGGCTGAGCT-3 ') 
Primer Sets 

20 (SEQ ID NO: 53) (Set 1 ) 5'-CCAGCACAGCTCTTCCCAGGAC-3' 

(SEQ ID NO: 54) 5 ' -GGAATGGCTGAGCTGACGTCTG-3 ' ) 

Q 

& (SEQ ID NO: 55) (Set 2) 5'-CTTCCCAGGACAACCTCAAGG-3' 

S (SEQ ID NO: 56 5'-GCAGGATGAGTGAGCCACGTG-3' 

(SEQ ID NO: 57) (Set 3) 5 '-GTCAGATCTGGTGACCTCACTG-3 

(SEQ ID NO: 58) 5 '-GAGGCACTGGAAAGCCCAGAG-3 ' 



2 Si 

£ it \ 

3f 

nJ 



35 



(SEQ ID NO: 59) 5 ' -CTGATGGC ATT ATGG AAC AC ATC AC-3 

(SEQ ID NO: 60) 5 '-CCCAGAACGAGAGACCAGTGAG-3 

(SEQ ID NO: 61) 5 '-GCTGATGGCGATGAATGAACACTG-3 

(SEQ ID NO: 62) 5 '-CCCAGAACGAGAGACCAGTGAG-3 

(SEQ ID NO: 63) 5 '-CGCGGATCCGAACACTGCGTTTGCTGGCTTTGATG-3 

(SEQ ID NO: 64) 5 '-CCTCTGTGTGCTGCTTCATTGGG-3 ' 
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TABLE 10B 



Used to Order the CA125 Carboxy Terminal Domain 



10 



fSEO ID NO- 303) 5 ' -GGAC AAGGTC ACC AC ACTCT AC-3 
( (SEQ Id NO: 304) 5 '-GCAGATCCTCCAGGTCTAGGTGTG-3 
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TABLE 10C 



Sense and Anti-Sense Primers Used to Amplify Overlapping Sequences 

in the Repeat Domain 
(SEQ ID HO: 305 and SEQ ID NO: 306, respectively) 



fSEO ID NO: 305) 5' GTC TCT ATG TCA ATG GTT TCA CCC-3' 
(SEQ ID NO: 306) 5'-TAG CTG CTC TCT GTC CAG TCC-3' 
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5- Sense Primer 1 Sequence and 3' Antisense Primer 2 
< (SEQ ID NO: 65 and SEQ ID NO: 66, respectively), and 

Nucleotide and Lino Acid Sequences of the CA125 Repeat Expressed in E. col, 
Nucleotide (seq ID N0: 67 and SEQ ID NO: 68, respectively) 
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(SEQ ID NO: 65) 5'-ACCGGATCCATGGGCCACACAGAGCCTGGCCC-3' 
(SEQ ID NO: 66) 5'-TGTAAGCTTAGGCAGGGAGGATGGAGTCC-3' 
(SEQ ID NO: 67) 

1 ATGAGAGGAT CGCATCACCA TCACCATCAC GGATCCATGG GCCACACAGA 
51 GCCTGGCCCT CTCCTGATAC CATTCACTTT CAACTTTACC ATCACCAACC 
101 TGCATTATGA GGAAAACATG CAACACCCTG GTTCCAGGAA GTTCAACACC 
151 ACGGAGAGGG TTCTGCAGGG TCTGCTCAAG CCCTTGTTCA AGAACACCAG 
201 TGTTGGCCCT CTGTACTCTG GCTGCAGACT GACCTTGCTC AGACCTGAGA 
251 AGCATGAGGC AGCCACTGGA GTGGACACCA TCTGTACCCA CCGCGTTGAT 
301 CCCATCGGAC CTGGACTGGA CAGAGAGCGG CTATACTGGG AGCTGAGCCA 
3 51 GCTGACCAAC AGCATCACAG AGCTGGGACC CTACACCCTG GACAGGGACA 
401 GTCTCTATGT CAATGGCTTC AACCCTCGGA GCTCTGTGCC AACCACCAGC 
451 ACTCCTGGGA CCTCCACAGT GCACCTGGCA ACCTCTGGGA CTCCATCCTC 
501 CCTGCCT 



(SEQ ID NO: 68) 

40 

MRGSHHHHHHG 
HYEENMQHPGS 
GPLYSGCRLTL 
GPGLDRERLYW 
45 VNGFNPRSSVP 



SMGHTEPGPLLI 
RKFNTTERVLQG 
LRPEKHEAATGV 
ELSQLTNSITEL 
TTSTPGTSTVHL 



PFTFNFTITNL 
LLKPLFKNTSV 
DTICTHRVDPI 
GPYTLDRDSLY 
ATSGTPSSLP 
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Additional Multiple Repeat Amino Acid Sequences 
(SEQ ID NO: 69 thru SEQ ID NO: 80) 



(SEQ ID NO: 69) 

ERVLQGLLGP MFKNTSVGLL YSG CRLTLLR PKKDGAATKV DAICTYRPDP 
KSPGLDREQL YWELSQLTHS ITELGPYTLD RDSLYVNGFT QRSSVPTTSI 
15 PGTPTVDLGT SGTPVSKPGP SAASPLLIPF TINFTITNLR YEENMGHPGS 

RKFNIMERVL QGLLKPLFKN T"'»°t.vc^ PT.TT.T.RPKKD GAATGVDAIC 
THRLDPKSPG LNREQLYWEL SKLTNDIEEL GPYTLDRNSL YVKGFTHQSS 
VSTTSTPGTS TVDLRTSGTP SSLSSPTIMA AGPLLIPFTI NFTITNLRYE 
ENMHHPGSRK FNTMERVLQG LLMPLFKNTS VSSLYSG CRL TLLRPEKDGA 

iJ 

>$ ATRVDAVCTH RPDPKSPGLD RERLYWKLSQ LTHGITELGP YTLDRNSLYV 

V.1 

£H NGFTHRSSMP TTSTPGTSTV DVGTSGTPSS SPSPTTAGPL LMPFTLNFTI 

TNLQYEEDMR RTGSRKFNTM ERVLQGLLKP LFKSTSVGPL YSG CRLTLLR 
PEKHGAATGV DAIC TLRLDP TGPGLDRERL YWELSQLTNS VTELGPYTLD 
RDSLYVNGFT HRSSVPTTSI PGTSAVHLET SGTPASLPGH TAPGPLLIPF 
TLNFTITNLH YEENMQHPGS RKFNTMERVL QGCLVPCSRN TNVGLLYSGC 
PT.TLLRXEKX XAATXVDXXC XXXXDPXXPG LDREXLYWEL SXLTXXIXEL 
GPYTLDRNSL YVNGFTHRSS VAPTSTPGTS TVDLGTSGTP SSLPSPTTVP 
LLVPFTLNFT ITNLQYGEDM RHPGSRKFNT TERVLQGLLG PLFKNSSVGP 
T.vsnraT.TST. RSEKDGAATG VDAICTHHLN PQSPGLDREQ LYWQLSQVTN 
45 GIKELGPYTL DRNSLYVNGF THRSSGLTTS TPWTSTVDLG TSGTPSPVPS 

PTTAGPLLI 
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TABLE 12 -continued 



Additional Multiple Repeat Amino Acid Sequences 
(SEQ ID NO: 6 9 through SEQ ID NO : 80) 

(SEQ ID NO: 70) 

QGLLGPMFKN TSVGLLYSG C RLTLLRPEKR GAATGVDTIC THRLDPLNPG 
LDREQLYWEL SKLTRGIIEL GPYLLDRGSL YVNGFTHRNF VPITSTPGTS 
TVHLGTSETP SSLPRPIVPG PLLVPFTLNF TITNLQYEEA MRHPGSRKFN 
TTERVLQGLL RPLFKNTSVS SLYSG CRLTL LRPEKDGAAT RVDAAC TYRP 
DPKSPGLDRE QLYWELSQLT HSITELGPYT LDRVSLYVNG FNPRSSVPTT 
STPGTSTVHL ATSGTPSSLP GHTAPVPLLI PFTLNFTITN LQYEEDMRHP 
GSRKFNTMER VLQGLLRPLF KNTSIGPLYS S CRLTLLRPE KDKAATRVDA 
ICTHHPDPQS PGLNREQLYW ELSQLTHGIT ELGPYTLDRD SLYVDGFTHW 
SPIPTTSTPG TSIVNLGTSG IPPSLPETTA TGPLLIPFTP NFTITNLQYE 
EDMRRTGSRK FNTMERVLQG LLSPIFKNSS VGPLYSG CRL TSLRPEKDGA 
ATGMDAVCLY HPNPKRPGLD REQLY 

(SEQ ID NO:71) 

ERVLQGLLKP LFKSTSVGPL YSG CRLTLLR PEKDGVATRV DAIC THRPDP 
KIPGLDRQQL YWELSQLTHS ITELGPYTLD RDSLYVNGFT QRSSVPTTST 
PGTFTVQPET SETPSSLPGP TATGPVLLPF TLNFTIINLQ YEEDMHRPGS 
RKFNTTERVL QGLLMPLFKN TSVGPLYSG C RLTLLRPEKQ EAATGVDTIC 
THRLDPSEPG LDREQLYWEL SQLTNSITEL GPYTLDRDSL YVNGFTHSGV 
LCPPPSILGI FTVQPETFET PSSLPGPTAT GPVLLPFTLN FTIINLQYEE 
DMHRPGSRKF NTTERVLQGL LTPLFKNTSV GPLYSG CRLT LLRPEKQEAA 
TGVDTICTHR VDPIGPGLDR ERLYWELSQL TNSITELGPY TLDRDSLYVN 
GFNPWSSVPT TSTPGTSTVH LATSGTPSSL PGHTAPVPLL IPFTLNFTIT 
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Additional Multiple Repeat Amino Acid Sequences 
(SEQ ID NO: 69 through SEQ ID NO: 80) 



NLHYEENMQH PGSRKFNTTE RVLQGLLKPL FKSTSVGPLY SGCRLTLLRP 
EKHGAATGVD AIC THRLDPK SPGVDREQLY WELSQLTNGI KELGPYTLDR 
NSLYVNGFTH WIPVPTSSTP GTSTVDLGSG TPSSLPSPTT AGPL 

(SEQ ID NO: 72) 

TSVOPLYSG C RLTLLRSEKD GAATGVDAIY THRLDPKSPG VDREQLYWEL 
SQLTNGIKEL GPYTLDRNSL YVNGFTHQTS APNTSTPGTS TVDLGTSGTP 
SSLPSPTSAG PLLIPFTINF TITNLRYEEN MHHPGSRKFN TMERVLQGLL 
KPLFKSTSVG PT.YSfl PBTiTL LRPEKDGVAT RVDAIC THRP DPKIPGLDRQ 
QLYWELSQLT HSITELGPYT LDRDSLYVNG FTQRSSVPTT STPGTFTVQP 
ETSETPSSLP GPTATGPVLL PFTLNFTIIN LQYEEDMHRP GSRKFNTTER 
yj VLQGLLKPLF KSTSVGPLYS G CRLTLLRPE KHGAATGVDA IC TLRLDPTG 

j$ PGLDRERLYW ELSQLTNSIT ELGPYTLDRD SLYVNGFNPW SSVPTTSTPG 

M 

Ex TSTVHLATSG TPSSLPGHTA PVPL 

P 

3f| (SEQ ID NO:73) 

2 ERVLQGLLKP LFKSTSVGPL YffnPRT/TLLR PEKRGAATGV DTICTHRLDP 

LNPGLDREQL YWELSKLTRG IIELGPYLLD RDSLYVNGFT HRSSVPTTSI 
PGTSAVHLET SGTPASLPGH TAPGPLLVPF TLNFTITNLQ YEEDMRHPGS 
RKFNTTERVL QGLLKPLFKS TfiVGPLYSG C RLTLLRPEKR GAATGVDTIC 
45 THRLDPLNPG LDREQLYWEL SKLTRGIIEL GPYLLDRGSL YVNGFTHRNF 

VPITSTPGTS TVHLGTSETP SSLPRPIVPG PLLIPF 



50 



57 




TABLE 12 -continued 



, ft 



Additional Multiple Repeat Amino Acid Sequences 
(SEQ ID NO: 6 9 through SEQ ID NO: 80) 



(SEQ ID NO: 74) 

10 ERVLQGLLRP VFKNTSVGPL YSG CRLTLLR PKKDGAATKV DAIC TYRPDP 

KSPGLDREQL YWELSQLTHS ITELGPYTLD RDSLYVNGFT QRSSVPTTSI 

PGTPTVDLGT SGTPVSKPGP SAASPLLVPF TLNFT I TNLQ YEEDMHRPGS 

15 

RKFNATERVL QGLLSPIFKN SSVGPLYSG C RLTSLRPEKD GAATGMDAVC 

LYHPNPKRPG LDREQLYWEL SQLTHNITEL GPYSLDRDSL YVNGFTHQSS 

20 MTTTRTPDTS TMHLATSRTP ASLSGPTTAS PLLIPF 

ri (SEQ ID NO: 75) 

2^ ERVLQGLLKP LFKSTSVGPL YSG CRLTLLR PEKRGAATGV DTIC THRLDP 

LNPGLDREQL YWELSKLTRG IIELGPYLLD RGSLYVNGFS RQSSMTTTRT 
PDTSTMHLAT SRTPASLSGP TTASPLLIPF TLNFTITNLQ YEENMGHPGS 
RKFNIMERVL QGLLNPIFKN SSVGPLYSG C RLTSLKPEKD GAATGMDAVC 
LYHPNPKRPG LDREQLYWEL SQLTHGIKEL GPYTLDRNSL YVNGFTHRSS 
VAPTSTPGTS TVDLGTSGTP SSLPSPTTAV PLLIPF 



d (SEQ ID NO: 76) 

40 ERVLQGLLKP LFRNSSLEYL YSG CRLASLR PEKDSSAMAV DAIC THRPDP 

EDLGLDRERL YWELSNLTNG IQELGPYTLD RNSLYVNGFT HRSSGLTTST 
PWTSTVDLGT SGTPSPVPSP TTAGPLLIPF TLNFTITNLQ YEENMGHPGS 

45 

RKFNIMERVL QGLLMPLFKN TSVSSLYSG C RLTLLRPEKD GAATRVDAVC 
TQRPDPKSPG LDRERLYWKL SQLTHGITEL GPYTLDRHSL YVNGLTHQSS 
50 MTTTRTPDTS TMHLATSRTP ASLSGPTTAS PLLIPF 
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TABLE 12 -continued 



% i 

'Si 

3G1 

m 



40 



45 



Additional Multiple Repeat Amino Acid Sequences 
5 (SEQ ID NO: 69 through SEQ ID NO: 80) 

(SEQ ID NO: 77) 

10 ERVLQGLLSP ISKNSSVGPL YSGCRLTSLR^EI^TCM^LYHPNP 

KRPGLDREQL YWELSQLTHN ITELGPYSLD RDSLYVNGFT HQNSVPTTST 
PGTSTVYWAT TGTPSSFPGH TEPGPLLIPF TVNFTITNLR YEENMHHPGS 

15 RKFNTTERVL QGLLRPVFKN TSVGPLYSGC RLTLLRPKKD GAATKVDAIC 

TYRPDPKSPG LDREQLYWEL SKLTNDIEEL GPYTLDRNSL YVNGFTHQSS 

20 VSTTSTPGTS TVDLRTSGTP SSLSSPTIMA AGPLLIPF 

(SEQ ID NO: 7 8) 

-J 

2 | ERVLHGLLTP LFKNTRVGPL VSKPRLTLLR PEKQEAATGV DTICTHRVDP 

t IGPGLDRERL YWELSQLTNS ITELGPYTLD RDSLYVNGFN PWSSVPTTST 

PGTSTVHLAT SGTPSSLPGH TAPVPLLIPF TLNFTITNLH YEENMQHPGS 
RKFNTTERVL QGLLKPLFKN TSVGPLYSGC_^L^L^KP^K1I_EAATCVDAIC 
TLRLDPTGPG LDRQLYWELS QLTNSVTELG PYTLDRDSLY VNGFTHRSSV 
3 | PTTSIPGTSA VHLETSGTPA SLPGHTAPGP LLIPFTLNFT ITNLQYEEDM 

H E 

H RRTGSRKFNT MERVLQGLLK PLFKSTSVGP LYSGCRLTLL RPEKRGAATG 

VDTICTHRLD PLNPGLDREQ LYWELSKLTR GIIELGPYLL DRGSLYVNGF 
THRNFVPITS TPGTSTVHLG TSETPSSLPR PIVPGPLLIP FTINFTITNL 
RYEENMHHPG SRKFNIMERV LQGLLGPLFK NSSVGPLYSG CP^ISLRSEK 
DGAATGVDAI C THHLNPQSP GLDREQLYWQ LSQMTNGIKE LGPYTLDRNS 
LYVNGFTHRS SGLTTSTPWT STVDLGTSGT PSPVPSPTTA GPLLIPF 
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TABLE 12 -continued 



3ft 



Additional Multiple Repeat Amino Acid Sequences 
(SEQ ID NO: 6 9 through SEQ ID NO: 80) 



(SEQ ID NO: 79) 

10 GPLYSG CRLT SLRPEKDGAA TGMDAVC LYH PNPKRPGLDR EQLYWELSQL 

THNITELGPY SLDRDSLYVN GFTHQNSVPT TSTPGTSTVY WATTGTPSSF 
PGHTEPGPLL IPFTLNFTIT NLQYEENMGH PGSRKFNITE SVLQGLLTPL 

15 

FKNSSVGPLY SG CRLISLRS EKDGAATGVD AIC THHLNPQ SPGLDREQLY 
WQLSQMTNGI KELGPYTLDR DSLYVNGFTH RSLGLTTSTP WTSTVDLGTS 
20 GTPSPVPSPT TAGPLLIPFT LNFTITNLQY EENMGHPGSR KFNIMERVLQ 

GLLRPVFKNT SVGPLYSG CR LTLLRPKKDG AATKVDAIC T YRPDPKSPGL 
J DREQLYWELS QLTHSITELG PYTLDRDSLY VNGFTQRSSV PTTSIPGTPT 



^ VDLGTSGTPV SKPGPSAASP 

HI 

i fi 



(SEQ ID NO: 80) 

f§ QLYWELSKLT NDIEELGPYT LDRNSLYVNG FTHQSSVSTT STPGTSTVDL 

L RTSGTPSSLS SPTIMAAGPL LIPFTLNFTI TNLQYEENMG HPGSRKFNIM 

Q 

3^1 ERVLQGLLGP MFKNTSVGLL YSG CRLTLLR PEKNGAATGM DAIC SHRLDP 

ru 

%j KSPGLNREQL YWELSQLTHG IKELGPYTLD RNSLYVNGFT HRSSVAPTST 

Q 

l& PGTSTVDLGT SGTPSSLPSP TTAVPLLIPF TLNFTITNLK YEEDMHCPGS 

40 

RKFNTTERVL QSLFGPMFKN TSVGPLYSG C RLTLLRSEKD GAATGVDAIC 
THRLDPKSLG VDREQLYWEL SQLTNGIKEL GPYTLDRNSL YVNGFTHQTS 
45 APNTSTPGTS TVDLGTSGTP SSLPSPTSAG PLLVPFTLNF TITNLQYEED 

MRRTGSRKFN TMESVLQGLL KPLFKNTSVG PLYSG CRLTL LRPEKDGAAT 
GVDAICTHRL DPKSPGLNRE QLYWELSKL 

50 
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TABLE 13 



251 



35 



45 



Amino Terminal Nucleotide Sequence 
(SEQ ID NO: 81) 



1 CAGAGAGCGT TGAGCTGGGA ACAGTGACAA GTGCTTATCA AGTTCCTTCA 

10 51 CTCTCAACAC GGTTGACAAG AACTGATGGC ATTATGGAAC ACATCACAAA 

101 AATACCCAAT GAAGCAGCAC ACAGAGGTAC CATAAGACCA GTCAAAGGCC 

151 CTCAGACATC CACTTCGCCT GCCAGTCCTA AAGGACTACA CACAGGAGGG 

15 

2 01 ACAAAAAGAA TGGAGAC CAC CACCACAGCT TTGAAG AC CA CCACCACAGC 

2 51 TTTGAAGACC ACTTCCAGAG CCACCTTGAC CACCAGTGTC TATACTCCCA 

2ft 301 CTTTGGGAAC ACTGACTCCC CTCAATGCAT CAAGGCAAAT GGCCAGCACA 

n 



J 3 51 ATCCTCACAG AAATGATGAT CACAACCCCA TATGTTTTCC CTGATGTTCC 

4 01 AGAAACGACA TCCTCATTGG CTACCAGCCT GGGAGCAGAA ACCAGCACAG 

= =j 4 51 CTCTTCCCAG GACAACCCCA TCTGTTCTCA ATAGAGAATC AGAGACCACA 

fn 

s 501 GCCTCACTGG TCTCTCGTTC TGGGGCAGAG AGAAGTCCGG TTATTCAAAC 

: t 

W 551 TCTAGATGTT TCTTCTAGTG AGCCAGATAC AACAGCTTCA TGGGTTATCC 



* 601 ATCCTGCAGA GACCATCCCA ACTGTTTCCA AGACAACCCC CAATTTTTTC 

P 



651 CACAGTGAAT TAGACACTGT ATCTTCCACA GCCACCAGTC ATGGGGCAGA 

701 CGTCAGCTCA GCCATTCCAA CAAATATCTC ACCTAGTGAA CTAGATGCAC 

751 TGACCCCACT GGTCACTATT TCGGGGACAG ATACTAGTAC AACATTCCCA 

40 801 ACACTGACTA AGTCCCCACA TGAAACAGAG ACAAGAACCA CATGGCTCAC 

851 TCATCCTGCA GAGACCAGCT CAACTATTCC CAGAACAATC CCCAATTTTT 

901 CTCATCATGA ATCAGATGCC ACACCTTCAA TAGCCACCAG TCCTGGGGCA 

951 GAAACCAGTT CAGCTATTCC AATTATGACT GTCTCACCTG GTGCAGAAGA 
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TABLE 13 -continued 



Amino Terminal Nucleotide Sequence 
(SEQ ID NO: 81) 



10 



15 



20 



: ft 



1001 TCTGGTGACC TCACAGGTCA CTAGTTCTGG GACAGACAGA AATATGACTA 

1051 TTCCAACTTT GACTCTTTCT CCTGGTGAAC CAAAGACGAT AGCCTCATTA 

1101 GTCACCCATC CTGAAGCACA GACAAGTTCG GCCATTCCAA CTTCAACTAT 

1151 CTCGCCTGCT GTATCACGGT TGGTGACCTC AATGGTCACC AGTTTGGCGG 

1201 CAAAGACAAG TACAACTAAT CGAGCTCTGA CAAACTCCCC TGGTGAACCA 

1251 GCTACAACAG TTTCATTGGT CACGCATCCT GCACAGACCA GCCCAACAGT 

1301 TCCCTGGACA ACTTCCATTT TTTTCCATAG TAAATCAGAC ACCACACCTT 

1351 CAATGACCAC CAGTCATGGG GCAGAATCCA GTTCAGCTGT TCCAACTCCA 

14 01 ACTGTTTCAA CTGAGGTACC AGGAGTAGTG ACCCCTTTGG TCACCAGTTC 

1451 TAGGGCAGTG ATCAGTACAA CTATTCCAAT TCTGACTCTT TCTCCTGGTG 

1501 AACCAGAGAC CACACCTTCA ATGGCCACCA GTCATGGGGA AGAAGCCAGT 

1551 TCTGCTATTC CAACTCCAAC TGTTTCACCT GGGGTACCAG GAGTGGTGAC 

1601 CTCTCTGGTC ACTAGTTCTA GGGCAGTGAC TAGTACAACT ATTCCAATTC 

P 1651 TGACTTTTTC TCTTGGTGAA CCAGAGACCA CACCTTCAAT GGCCACCAGT 

1701 CATGGGACAG AAGCTGGCTC AGCTGTTCCA ACTGTTTTAC CTGAGGTACC 

1751 AGGAATGGTG ACCTCTCTGG TTGCTAGTTC TAGGGCAGTA ACCAGTACAA 
1801 CTCTTCCAAC TCTGACTCTT TCTCCTGGTG AACCAGAGAC CACACCTTCA 
1851 ATGGCCACCA GTCATGGGGC AGAAGCCAGC TCAACTGTTC CAACTGTTTC 
1901 ACCTGAGGTA CCAGGAGTGG TGACCTCTCT GGTCACTAGT TCTAGTGGAG 
1951 TAAACAGTAC AAGTATTCCA ACTCTGATTC TTTCTCCTGG TGAACTAGAA 



ru 
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40 



45 



62 



TABLE 13 -continued 



Amino Terminal Nucleotide Sequence 
(SEQ ID NO: 81) 





2001 


ACCACACCTT CAATGGCCAC CAGTCATGGG GCAGAAGCCA GCTCAGCTGT 


10 


2051 


TCCAACTCCA ACTGTTTCAC CTGGGGTATC AGGAGTGGTG ACCCCTCTGG 




2101 


TCACTAGTTC CAGGGCAGTG ACCAGTACAA CTATTCCAAT TCTAACTCTT 




2151 


TCTTCTAGTG AGCCAGAGAC CACACCTTCA ATGGCCACCA GTCATGGGGT 


15 


2201 


AGAAGCCAGC TCAGCTGTTC TAACTGTTTC ACCTGAGGTA CCAGGAATGG 




2251 


TGACCTCTCT GGTCACTAGT TCTAGAGCAG TAACCAGTAC AACTATTCCA 


2ft 


2301 


ACTCTGACTA TTTCTTCTGA TGAACCAGAG ACCACAACTT CATTGGTCAC 


■J'i 
'•ii si 

. 


2351 


CCATTCTGAG GCAAAGATGA TTTCAGCCAT TCCAACTTTA GCTGTCTCCC 


L.5 i 

! W 

y i 


2401 


CTACTGTACA AGGGCTGGTG ACTTCACTGG TCACTAGTTC TGGGTCAGAG 


251 

Ui 


2451 


ACCAGTGCGT TTTCAAATCT AACTGTTGCC TCAAGTCAAC CAGAGACCAT 




2501 


AGACTCATGG GTCGCTCATC CTGGGACAGA AGCAAGTTCT GTTGTTCCAA 


1 


2551 


CTTTGACTGT CTCCACTGGT GAGCCGTTTA CAAATATCTC ATTGGTCACC 


□ 


2601 


CATCCTGCAG AGAGTAGCTC AACTCTTCCC AGGACAACCT CAAGGTTTTC 




2651 


CCACAGTGAA TTAGACACTA TGCCTTCTAC AGTCACCAGT CCTGAGGCAG 


35 


2701 


AATCCAGCTC AGCCATTTCA ACTACTATTT CACCTGGTAT ACCAGGTGTG 




2751 


CTGACATCAC TGGTCACTAG CTCTGGGAGA GACATCAGTG CAACTTTTCC 


40 


2801 


AACAGTGCCT GAGTCCCCAC ATGAATCAGA GGCAACAGCC TCATGGGTTA 




2851 


CTCATCCTGC AGTCACCAGC ACAACAGTTC CCAGGACAAC CCCTAATTAT 




2901 


TCTCATAGTG AACCAGACAC CACACCATCA ATAGCCACCA GTCCTGGGGC 


45 


2951 


AGAAGCCACT TCAGATTTTC CAACAATAAC TGTCTCACCT GATGTACCAG 
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TABLE 13 -continued 



Amino Terminal Nucleotide Sequence 
(SBQ ID NO: 81) 



3001 

10 3051 
3101 
3151 

15 

3201 
3251 

2ft, 3301 
*jj 3351 

m 34 01 

h! 3451 

sis 

«" 3501 
U 

3fl 3551 

!'U 

^ 3601 

^* 3651 
35 

3701 
3751 

40 3801 
3851 
3901 

45 

3951 



ATATGGTAAC CTCACAGGTC ACTAGTTCTG GGACAGACAC CAGTATAACT 
ATTCCAACTC TGACTCTTTC TTCTGGTGAG CCAGAGACCA CAACCTCATT 
TATCACCTAT TCTGAGACAC ACACAAGTTC AGCCATTCCA ACTCTCCCTG 
TCTCCCCTGG TGCATCAAAG ATGCTGACCT CACTGGTCAT CAGTTCTGGG 
ACAGACAGCA CTACAACTTT CCCAACACTG ACGGAGACCC CATATGAACC 
AGAGACAACA GCCATACAGC TCATTCATCC TGCAGAGACC AACACAATGG 
TTCCCAAGAC AACTCCCAAG TTTTCCCATA GTAAGTCAGA CACCACACTC 
CCAGTAGCCA TCACCAGTCC TGGGCCAGAA GCCAGTTCAG CTGTTTCAAC 
GACAACTATC TCACCTGATA TGTCAGATCT GGTGACCTCA CTGGTCCCTA 
GTTCTGGGAC AGACACCAGT ACAACCTTCC CAACATTGAG TGAGACCCCA 
TATGAACCAG AGACTACAGT CACGTGGCTC ACTCATCCTG CAGAAACCAG 
CACAACGGTT TCTGGGACAA TTCCCAACTT TTCCCATAGG GGATCAGACA 
CTGCACCCTC AATGGTCACC AGTCCTGGAG TAGACACGAG GTCAGGTGTT 
CCAACTACAA CCATCCCACC CAGTATACCA GGGGTAGTGA CCTCACAGGT 
CACTAGTTCT GCAACAGACA CTAGTACAGC TATTCCAACT TTGACTCCTT 
CTCCTGGTGA ACCAGAGACC ACAGCCTCAT CAGCTACCCA TCCTGGGACA 
CAGACTGGCT TCACTGTTCC AATTCGGACT GTTCCCTCTA GTGAGCCAGA 
TACAATGGCT TCCTGGGTCA CTCATCCTCC ACAGAC CAGC ACACCTGTTT 
CCAGAACAAC CTCCAGTTTT TCCCATAGTA GTCCAGATGC CACACCTGTA 
ATGGCCACCA GTCCTAGGAC AGAAGCCAGT TCAGCTGTAC TGACAACAAT 
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TABLE 13 -continued 



Amino Terminal Nucleotide Sequence 
(SEQ ID NO: 81) 





4001 


CTCACCTGGT 


GCACCAGAGA 


TGGTGACTTL 




APTTPTGGGG 


10 


4051 


CAGCAACCAG 


TACAACTGTT 


CCAACTTTGA 


L. 1 1 11.J- tt 


TPPTATGPCA 




4101 


G AG AC C AC AG 


CCTTATTGAG 


CACCLA1CLL. 


7\ p a a r 1 a ppp A 


PAAGTAAAAC 




4151 


ATTTCCTGCT 


TCAACTGTGT 


TTCCTLAAbl 


ATPAPAPAPP 


APAGPPTCAC 


15 


4201 


TCACCATTAG 


ACCTGGTGCA 


GAGACTAGLA 


LALiv- 1L1 L-v-^- 


AAPTPAGACA 




4251 


ACATCCTCTC 


TCTTCACCCT 


ACTTGTAACT 


Pp7\ A PP A P P A 


PAPTTPATPT 




4301 


AAGTCCAACT 


GCTTCACCTG 


GTGTTTCTGC 


A A A A AnSPfP 


PPAPTTTPPA 


& 


4351 


CCCATCCAGG 


GACAGAGACC 


AGCACAATGA 


TTPfA A PTTP 
1 lit 


A APTPTTTCC 
x x x x v^v- 


If! 


4401 


CTTGGTTTAC 


TAGAGACTAC 


AGGCTTACTG 


CjLLALUALxU 1 


PTTPAPP APA 




4451 


GACCAGCACG 


AGTACTCTAA 


CTCTGACTGT 


1 1 LLL-U HaU 1 


w JL \— X v— X uuuu 


[0 


4501 


TTTCCAGTGC 


CTCTATAACA 


ACTGATAAGC 


r^O/^A A A PTPT 


PAPPTPPTGP 


1 


4551 


AACACAGAAA 


CCTCACCATC 


TGTAACTTCA 


/-i mrp/iri 7V ppPP 

(jl 1(jOAH~CI~ 


PAPA ATTTTP 


a 


4601 


CAGGACTGTC 


ACAGGCACCA 


CTATGACCTT 


P7\mT\ PP A TP A 

CjA 1 A I- C A 1 UA 


PAPATPPPAA 


35 


4651 


CACCACCTAA 


AACCAGTCAT 


GGAGAAGGAG 


■"POAf^TPPA AP 


P APTATPTTP 


4701 


AGAACTACAA 


TGGTTGAAGC 


CACTAATTTA 


ppmj\ PPAPAP 

CjtL 1 AH-AUAvj 


PTTPPAPTPP 




4751 


CACTGTGGCC 


AAGACAACAA 


CCACCTTCAA 


TACACTGGCT 


GGAAGCCTCT 


40 


4801 


TTACTCCTCT 


GACCACACCT 


GGGATGTCCA 


CCTTGGCCTC 


TGAGAGTGTG 




4851 


ACCTCAAGAA 


CAAGTTATAA 


CCATCGGTCC 


TGGATCTCCA 


CCACCAGCAG 


45 


4901 


TTATAACCGT 


CGGTACTGGA 


CCCCTGCCAC 


CAGCACTCCA 


GTGACTTCTA 


4951 


CATTCTCCCC 


AGGGATTTCC 


ACATCCTCCA 


TCCCCAGCTC 


CACAGCAGCC 




10 



15 
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TABLE 13 -continued 



Amino Terminal Nucleotide Sequence 
(SEQ ID NO: 81) 



5001 ACAGTCCCAT 

5051 GCAGTACGAG 

5101 CAGAGAGAGA 

5151 CTGGAATACC 

52 01 GGATAGCTCA 

52 51 CTGAAGACCT 

53 01 CTGACAAATG 

53 51 TCTCTATGTC 

54 01 CTCCTGGGAC 
54 51 AGCCCCAGCC 



TCATGGTGCC 

GAGGACATGC 

ACTGCAGGGT 

TCTATTCAGG 

GCCATGGCAG 

CGGACTGGAC 

GCATCCAGGA 

AATGGTTTCA 

CTCCACAGTG 

CCACG 



ATTCACCCTC 

GGCACCCTGG 

CTGCTCAAAC 

CTGCAGACTA 

TGGATGCCAT 

AGAGAGCGAC 

GCTGGGCCCC 

CCCATCGAAG 

GATGTGGGAA 



AACTTCACCA 

TTCCAGGAAG 

CCTTGTTCAG 

GCCTCACTCA 

CTGCACACAT 

TGTACTGGGA 

TACACCCTGG 

CTCTATGCCC 

CCTCAGGGAC 



TCACCAACCT 

TTCAACGCCA 

GAATAGCAGT 

GGCCAGAGAA 

CGCCCTGACC 

GCTGAGCAAT 

ACCGGAACAG 

ACCACCAGCA 

TCCATCCTCC 
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TABLE 14 



Amino Terminal Protein Sequence 
(SEQ ID NO: 82) 





1 


ESVLEGTVTS 


AYQVPSLSTR 


LTRTDGIMEH 


ITKIPNEAAH 


RGTIRPVKGP 


10 


51 


QTSTSPASPK 


GLHTGGTKRM 


ETTTTALKTT 


TTALKTTSRA 


TLTTSVYTPT 




101 


LGTLTPLNAS 


RQMASTILTE 


MMITTPYVFP 


DVPETTSSLA 


TSLGAETSTA 




151 


LPRTTPSVLN 


RESETTASLV 


SRSGAERSPV 


IQTLDVSSSE 


PDTTASWVIH 


15 


201 


PAETIPTVSK 


TTPNFFHSEL 


DTVSSTATSH 


GADVSSAIPT 


NISPSELDAL 




251 


TPLVTISGTD 


TSTTFPTLTK 


SPHETETRTT 


WLTHPAETSS 


TIPRTIPNFS 




301 


HHESDATPSI 


ATSPGAETSS 


AIPIMTVSPG 


AEDLVTSQVT 


SSGTDRNMTI 


EH 


351 


PTLTLSPGEP 


KTIASLVTHP 


EAQTSSAIPT 


STISPAVSRL 


VTSMVTSLAA 


if! 

'Si 


401 


KTSTTNRALT 


NSPGEPATTV 


SLVTHPAQTS 


PTVPWTTSIF 


FHSKSDTTPS 


2;5i 

la -1ST 
S,s >s 5 


451 


MTTSHGAESS 


SAVPTPTVST 


EVPGWTPLV 


TSSRAVISTT 


IPILTLSPGE 


: i 

IS u* 


501 


PETTPSMATS 


HGEEASSAIP 


TPTVSPGVPG 


WTSLVTSSR 


AVTSTTIPIL 


t ^ 

■!;!:!? 

3i 


551 


TFSLGEPETT 


PSMATSHGTE 


AGSAVPTVLP 


EVPGMVTSLV 


ASSRAVTSTT 


[ii 

L-s, 


601 


LPTLTLSPGE 


PETTPSMATS 


HGAEASSTVP 


TVSPEVPGW 


TSLVTSSSGV 


35 


651 


NSTSIPTLIL 


SPGELETTPS 


MATSHGAEAS 


SAVPTPTVSP 


GVSGWTPLV 


701 


TSSRAVTSTT 


IPILTLSSSE 


PETTPSMATS 


HGVEASSAVL 


TVSPEVPGMV 




751 


TSLVTSSRAV 


TSTTIPTLTI 


SSDEPETTTS 


LVTHSEAKMI 


SAIPTLAVSP 


40 


801 


TVQGLVTSLV 


TSSGSETSAF 


SNLTVASSQP 


ETIDSWVAHP 


GTEASSWPT 




851 


LTVSTGEPFT 


NISLVTHPAE 


SSSTLPRTTS 


RFSHSELDTM 


PSTVTSPEAE 




901 


SSSAISTTIS 


PGIPGVLTSL 


VTSSGRDISA 


TFPTVPESPH 


ESEATASWVT 
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TABLE 14 -continued 



Amino Terminal Protein Sequence 
(SEQ ID NO: 82) 



10 



15 



2P 



■S3ST 

£P 
in 
sj 
2fi 



35 



40 



951 HPAVTSTTVP 
1001 MVTSQVTSSG 
1051 SPGASKMLTS 
1101 PRTTPKFSHS 
1151 SGTDTSTTFP 

12 01 APSMVTSPGV 
1251 PGEPETTASS 
1301 RTTSSFSHSS 

13 51 ATSTTVPTLT 

14 01 TIRPGAETST 
1451 HPGTETSTMI 
1501 SSASITTDKP 
1551 PPKTSHGEGV 

16 01 TPLTTPGMST 
1651 FSPGISTSSI 

17 01 ERELQGLLKP 
1751 EDLGLDRERL 
1801 PGTSTVDVGT 



RTTPNYSHSE 
TDTSITIPTL 
LVISSGTDST 
KSDTTLPVAI 
TLSETPYEPE 
DTRSGVPTTT 
ATHPGTQTGF 
PDATPVMATS 
HSPGMPETTA 
ALPTQTTSSL 
PTSTLSLGLL 
QTVTSWNTET 
SPTTILRTTM 
LASESVTSRT 
PSSTAATVPF 
LFRNSSLEYL 
YWELSNLTNG 
SGTPSSSPSP 



PDTTPSIATS 
TLSSGEPETT 
TTFPTLTETP 
TSPGPEASSA 
TTATWLTHPA 
IPPSIPGWT 
TVPIRTVPSS 
PRTEASSAVL 
LLSTHPRTET 
FTLLVTGTSR 
ETTGLLATSS 
SPSVTSVGPP 
VEATNLATTG 
SYNHRSWIST 
MVPFTLNFTI 
YSGCRLASLR 
IQELGPYTLD 



PGAEATSDFP 
TSFITYSETH 
YEPETTAIQL 
VSTTTISPDM 
ETSTTVSGTI 
SQVTSSATDT 
EPDTMASWVT 
TTISPGAPEM 
SKTFPASTVF 
VDLSPTASPG 
SAETSTSTLT 
EFSRTVTGTT 
SSPTVAKTTT 
TSSYNRRYWT 
TNLQYEEDMR 
PEKDSSAMAV 
RNSLYVNGFT 



TITVSPDVPD 
TSSAIPTLPV 
IHPAETNTMV 
SDLVTSLVPS 
PNFSHRGSDT 
STAIPTLTPS 
HPPQTSTPVS 
VTSQITSSGA 
PQVSETTASL 
VSAKTAPLST 
LTVSPAVSGL 
MTLIPSEMPT 
TFNTLAGSLF 
PATSTPVTST 
HPGSRKFNAT 
DAICTHRPDP 
HRSSMPTTST 



45 
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TABLE 15 



10 



15 



2£B 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 

(SEQIDN 1 ° : gccacagtcc cattcatggt gccattcacc ctcaacttca ccatcaccaa 

51 CCTGCAGTAC GAGGAGGACA TGCGGCACCC TGGTTCCAGG AAGTTCAACG 

101 CCACAGAGAG AGAACTGCAG GGTCTGCTCA AACCCTTGTT CAGGAATAGC 

151 AGTCTGGAAT ACCTCTATTC AGGCTGCAGA CTAGCCTCAC TCAGGCCAGA 

201 GAAGGATAGC TCAGCCATGG CAGTGGATGC CATCTGCATA CATCGCCCTG 

251 ACCCTGAAGA CCTCGGACTG GACAGAGAGC GACTGTACTG GGAGCTGAGC 

3 01 AATCTGACAA ATGGCATCCA GGAGCTGGGC CCCTACACCC TGGACCGGAA 

| 3 51 CAGTCTCTAT GTCAATGGTT TCACCCATCG AAGCTCTATG CCCACCACCA 

J 4oi GCACTCCTGG GACCTCCACA GTGGATGTGG GAACCTCAGG GACTCCATCC 

W 

451 TCCAGCCCCA GCCCCACG 

| (SEQ ID i M0 ^ TGGCC ctcTCCTGAT GCCGTTCACC CTCAACTTCA CCATCACCAA 

51 CCTGCAGTAC GAGGAGGACA TGCGTCGCAC TGGCTCCAGG AAGTTCAACA 

101 CCATGGAGAG TGTCCTGCAG GGTCTGCTCA AGCCCTTGTT CAAGAACACC 

151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA TTGACCTTGC TCAGGCCCAA 

201 GAAAGATGGG GCAGCCACTG CAGTGGATGC CATCTGCACC CACCGCCTTG 

251 ACCCCAAAAG CCCTGGACTC AACAGGGAGC AGCTGTACTG GGAGCTAAGC 

3 01 AAACTGACCA ATGACATTGA AGAGCTGGGC CCCTACACCC TGGACAGGAA 

351 CAGTCTCTAT GTCAATGGTT TCACCCATCA GAGCTCTGTG TCCACCACCA 

401 GCACTCCTGG GACCTCCACA GTGGATCTCA GAACCTCAGG GACTCCATCC 



35 



40 



45 



69 




10 



4 51 TCCCTCTCCA GCCCCACAAT TATG 



(SEQ I^NO: c 85) tggcc ctctcctggt ACCATTC ACC 



15 



2fc 

■.in 



25i 



: vi 



35 



40 



51 CCTGCAGTAT GGGGAGGACA TGGGTCACCC 

101 CCACAGAGAG GGTCCTGCAG GGTCTGCTTG 

151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA 

201 GAAGGATGGA GCAGCCACTG GAGTGGATGC 

251 ACCCCAAAAG CCCTGGACTC AACAGAGAGC 

301 CAACTGACCA ATGGCATCAA AGAGCTGGGC 

351 CAGTCTCTAT GTCAATGGTT TCACCCATCG 

401 GCACTCCTGG GACCTCCACA GTGGACCTTG 

4 51 TCCCTCCCAA GCCCCGCA 



CTCAACTTCA 
TGGCTCCAGG 
GTCCCATATT 
CTGACCTCTC 
CATCTGCATC 
GGCTGTACTG 
CCCTACACCC 
GACCTCTGTG 
GAACCTCAGG 



CCATCACCAA 
AAGTTCAACA 
CAAGAACACC 
TCAGGTCTGA 
CATCATCTTG 
GGAGCTGAGC 
TGGACAGGAA 
CCCACCACCA 
GACTCCATTC 



(SEQ n™^^ CTCTCCTGGT GCTGTTCACC 



51 CCTGAAGTAT GAGGAGGACA TGCATCGCCC 

10 1 CCACTGAGAG GGTCCTGCAG ACTCTGCTTG 

151 AGTGTTGGCC TTCTGTACTC TGGCTGCAGA 

201 GAAGGATGGA GCAGCCACTG GAGTGGATGC 

2 51 ACCCCAAAAG CCCTGGACTG GACAGAGAGC 

301 CAGCTGACCA ATGGCATCAA AGAGCTGGGC 



CTCAACTTCA 
TGGCTCCAGG 
GTCCTATGTT 
CTGACCTTGC 
CATCTGCACC 
AGCTATACTG 
CCCTACACCC 



CCATCACCAA 
AAGTTCAACA 
CAAGAACACC 
TCAGGTCCGA 
CACCGTCTTG 
GGAGCTGAGC 
TGGACAGGAA 



45 



70 




TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 







351 


CAGTCTCTAT 


GTCAATGGTT 


TCACCCATTG 


GATCCCTGTG 


CCCACCAGCA 


10 




401 
451 


GCACTCCTGG 
CTCCCCAGCC 


GACCTCCACA 
CCACA 


GTGGACCTTG 


GGTCAGGGAC 


TCCATCCTCC 


15 


(SEQ 


ID NO: 87) 

1 GCTGCTGGCC 


CTCTCCTGGT 


GCCATTCACC 


CTCAACTTCA 


CCATCACCAA 






51 


CCTGCAGTAC 


GAGGAGGACA 


TGCATCACCC 


AGGCTCCAGG 


AAGTTCAACA 






101 


CCACGGAGCG 


GGTCCTGCAG 


GGTCTGCTTG 


GTCCCATGTT 


CAAGAACACC 


20, 

ill 




151 


AGTGTCGGCC 


TTCTGTACTC 


TGGCTGCAGA 


CTGACCTTGC 


TCAGGTCCGA 


. & 




201 


GAAGGATGGA 


GCAGCCACTG 


GAGTGGATGC 


CATCTGCACC 


CACCGTCTTG 


in 

25j 

f : 




251 


ACCCCAAAAG 


CCCTGGAGTG 


GACAGGGAGC 


AGCTATACTG 


GGAGCTGAGC 


: t 




301 


CAGCTGACCA 


ATGGCATCAA 


AGAGCTGGGT 


CCCTACACCC 


TGGACAGAAA 






351 


CAGTCTCTAT 


GTCAATGGTT 


TCACCCATCA 


GACCTCTGCG 


CCCAACACCA 


3$ 

FU 

"•J 
S3 

i „ 




401 
451 


GCACTCCTGG 
TCCCTCCCCA 


GACCTCCACA 
GCCCTACA 


GTGGACCTTG 


GGACCTCAGG 


GACTCCATCC 


35 


(SEQ 


ID NO: 88) 

1 NCNNCTGNCC 


CTCTCCTGNT 


NCCNTTCACC 


NTCAACTTNA 


CCATCACCAA 






51 


CCTGCANTAN 


GNGGANNALA 


"ppptstmpmppp 




AAGTTCAACA 


40 




101 


CCACNGAGNG 


NGTNCTGCAG 


GGTCTGCTNN 


NNCCCNTNTT 


CAAGAACACC 






151 


AGTGTTGGCC 


CTCTGTACTC 


TGGCTGCAGA 


CTGACCTTGC 


TCAGGTCCGA 


45 




201 


GAAGGATGGA 


GCAGCCACTG 


GAGTGGATGC 


CATCTGCACC 


CACCGTCTTG 




251 


ACCCCAAAAG 


CCCTGGAGTG 


GACAGGGAGC 


AGCTATACTG 


GGAGCTGAGC 



10 



15 



20, 

: i 

5 

tn 

ill 
2lj 
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TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



3 01 CAGCTGACCA ATGGCATCAA AGAGCTGGGT CCCTACACCC TGGACAGAAA 

3 51 CAGTCTCTAT GTCAATGGTT TCACCCATCA GACCTCTGCG CCCAACACCA 
.4 01 GCACTCCTGG GACCTCCACA GTGGACCTTG GGACCTCAGG GACTCCATCC 

4 51 TCCCTCCCCA GCCCTACA 
(SEQ ID NO* 89) 

1 TCTGCTGGCC CTCTCCTGGT GCCATTCACC CTCAACTTCA CCATCACCAA 

51 CCTGCAGTAC GAGGAGGACA TGCATCACCC AGGCTCCAGG AAGTTCAACA 

101 CCACGGAGCG GGTCCTGCAG GGTCTGCTTG GTCCCATGTT CAAGAACACC 

151 AGTGTCGGCC TTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGGCCTGA 

201 GAAGAATGGG GCAGCCACTG GAATGGATGC CATCTGCAGC CACCGTCTTG 

0 251 ACCCCAAAAG CCCTGGACTC AACAGAGAGC AGCTGTACTG GGAGCTGAGC 

□ 3 01 CAGCTGACCC ATGGCATCAA AGAGCTGGGC CCCTACACCC TGGACAGGAA 

351 CAGTCTCTAT GTCAATGGTT TCACCCATCG GAGCTCTGTG GCCCCCACCA 

O 4 01 GCACTCCTGG GACCTCCACA GTGGACCTTG GGACCTCAGG GACTCCATCC 

E e 

35 451 TCCCTCCCCA GCCCCACA 

(SEQ ID NO* 90) 

1 ACAGCTGTTC CTCTCCTGGT GCCGTTCACC CTCAACTTTA CCATCACCAA 

51 TCTGCAGTAT GGGGAGGACA TGCGTCACCC TGGCTCCAGG AAGTTCAACA 

101 C C AC AGAGAG GGTCCTGCAG GGTCTGCTTG GTCCCTTGTT CAAGAACTCC 

151 AGTGTCGGCC CTCTGTACTC TGGCTGCAGA CTGATCTCTC TCAGGTCTGA 

201 GAAGGATGGG GCAGCCACTG GAGTGGATGC CATCTGCACC CACCACCTTA 



45 



72 



# 

TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 

5 (SEQ ID NO: 83 thru SEQ ID NO: 145) 

251 ACCCTCAAAG CCCTGGACTG GACAGGGAGC AGCTGTACTG GCAGCTGAGC 

10 301 CAGATGACCA ATGGCATCAA AGAGCTGGGC CCCTACACCC TGGACCGGAA 

351 CAGTCTCTAC GTCAATGGTT TCACCCATCG GAGCTCTGGG CTCACCACCA 

4 01 GCACTCCTTG GACTTCCACA GTTGACCTTG GAACCTCAGG GACTCCATCC 

15 

4 51 CCCGTCCCCA GCCCCACA 





(SEQ 


ID NC 

1 


): 91) 

ACTGCTGGCC 


CTCTCCTGGT 


GCCATTCACC 


CTCAACTTCA 


CCATCACCAA 


2Q. 
>3 




51 


CCTGCAGTAT 


GAGGAGGACA 


TGCATCGCCC 


TGGATCTAGG 


AAGTTLAALA 


a 




101 


CC AC AG AG AG 


GGTCCTGCAG 


GGTCTGCTTA 


GTCCCATTTT 


CAAGAAC1LC 


1 ft 

2ft 




151 


AGTGTTGGCC 


CTCTGTACTC 


TGGCTGCAGA 


CTGACCTCTC 


TCAGGCCCGA 


u 
m 




201 


GAAGGATGGG 


GCAGCAACTG 


GAATGGATGC 


TGTCTGCCTC 


TACCACCCTA 


s 




251 


ATCCCAAAAG 


ACCTGGACTG 


G AC AG AG AG C 


AGCTGTACTG 


GGAGCTAAGC 


3# 




301 


CAGCTGACCC 


ACAACATCAC 


TGAGCTGGGC 


CCCTACAGCC 


TGGACAGGGA 






351 


CAGTCTCTAT 


GTCAATGGTT 


TCACCCATCA 


GAACTCTGTG 


CCCACCACCA 


35 




401 
451 


GTACTCCTGG 
TCCTTCCCCG 


GACCTCCACA 
GCCACACA 


GTGTACTGGG 


CAACCACTGG 


GACTCCATCC 


40 


(SEQ 


ID NO: 92) 

1 GAGCCTGGCC 


CTCTCCTGAT 


ACCATTCACT 


TTCAACTTTA 


CCATCACCAA 






51 


CCTGCATTAT 


GAGGAAAACA 


TGCAACACCC 


TGGTTCCAGG 


AAGTTCAACA 






101 


CCACGGAGAG 


GGTTCTGCAG 


GGTCTGCTCA 


AGCCCTTGTT 


CAAGAACACC 



151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTCTC TCAGGCCCGA 
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TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 

201 GAAGGATGGG GCAGCAACTG GAATGGATGC TGTCTGCCTC TACCACCCTA 
251 ATCCCAAAAG ACCTGGGCTG GACAGAGAGC AGCTGTACTG GGAGCTAAGC 
3 01 CAGCTGACCC ACAACATCAC TGAGCTGGGC CCCTACAGCC TGGACAGGGA 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCA GAACTCTGTG CCCACCACCA 
401 GTACTCCTGG GACCTCCACA GTGTACTGGG CAACCACTGG GACTCCATCC 
451 TCCTTCCCCG GCCACACA 

ID 1 N °GAGCCTGGCC CTCTCCTGAT ACCATTCACT TTCAACTTTA CCATCACCAA 
51 CCTGCATTAT GAGGAAAACA TGCAACACCC TGGTTCCAGG AAGTTCAACA 
101 CCACGGAGAG GGTTCTGCAG GGTCTGCTCA AGCCCTTGTT C AAGAAC AC C 
151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGACCTGA 
201 GAAGCATGAG GCAGCCACTG GAGTGGACAC CATCTGTACC CACCGCGTTG 
251 ATCCCATCGG ACCTGGACTG GACAGGGAGC GGCTATACTG GGAGCTGAGC 
3 01 CAGCTGACCA ACAGCATTAC CGAACTGGGA CCCTACACCC TGGACAGGGA 

3 51 CAGTCTCTAT GTCAATGGCT TCAACCCTCG GAGCTCTGTG CCAACCACCA 

4 01 GCACTCCTGG GACCTCCACA GTGCACCTGG CAACCTCTGG GACTCCATCC 
451 TCCCTGCCTG GCCACACA 

I ID NO • 94) 

' 1 GCCCCTGTCC CTCTCTTGAT ACCATTCACC CTCAACTTTA CCATCACCAA 
51 CCTGCATTAT GAGGAAAACA TGCAACACCC TGGTTCCAGG AAGTTCAACA 
101 CCACGGAGAG GGTTCTGCAG GGTCTGCTCA AGCCCTTGTT CAAGAACACC 

74 



10 



15 



2 fc 



m 

Mi 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



151 AGTGTTGGCC 

201 GAAGCATGAG 

251 ATCCCATCGG 

3 01 CANCTGACCA 

351 CAGTCTCTAT 

401 GCACTCCTGG 

451 TCCNTCCCCN 



CTCTGTACTC 

GCAGCCACTG 

ACCTGGACTG 

ANNNCATCNN 

GTCAATGGTT 

GACCTCCACA 

GCCNCACA 



TGGCTGCAGA 

GAGTGGACAC 

NACAGNGAGC 

NGAGCTGGGN 

TCACCCATCN 

GTGNACNTNG 



CTGACCTTGC 

CATCTGTACC 

NGCTNTACTG 

CCCTACACCC 

GANCTCTGNG 

GNACCTCNGG 



TCAGACCTGA 

CACCGCGTTG 

GGAGCTNAGC 

TGGACAGGNA 

CCCACCACCA 

GACTCCATCC 



(SEQ ID NO: 95) 

1 TCTGCTGGCC 



40 



51 CCTGCAGTAC 

101 CCACGGAGCG 

151 AGTGTCGGCC 

201 GAAGAATGGG 

2 51 ACCCCAAAAG 

3 01 CAGCTGACCC 
351 CAGTCTCTAT 
401 GCACTCCTGG 
451 TCCCTCCCCA 



CTCTCCTGGT 
GAGGAGGACA 
GGTCCTGCAG 
TTCTGTACTC 
GCAGCCACTG 
CCCTGGACTC 
ATGGCATCAA 
GTCAATGGTT 
GACCTCCACA 
GCCCCACA 



GCCATTCACC 
TGCATCACCC 
GGTCTGCTTG 
TGGCTGCAGA 
GAATGGATGC 
GACAGAGAGC 
AGAGCTGGGC 
TCACCCATCG 
GTGGACCTTG 



CTCAACTTCA 
AGGCTCCAGG 
GTCCCATGTT 
CTGACCTTGC 
CATCTGCAGC 
AGCTGTACTG 
CCCTACACCC 
GAGCTCTGTG 
GGACCTCAGG 



CCATCACCAA 
AAGTTCAACA 
CAAGAACACC 
TCAGGCCTGA 
CACCGTCTTG 
GGAGCTGAGC 
TGGACAGGAA 
GCCCCCACCA 
GACTCCATCC 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



10 <SEQ "rAcicCTOTTC CTCTCCTGGT GCCGTTCACC CTCAACTTTA CCATCACCAA 
SI TCTGCAGTAT GGGGAGGACA TGCGTCACCC TGGCTCCAGG AAGTTCAACA 
101 CCACAGAGAG GGTCCTGCAG GGTCTGCTTG GTCCCTTGTT CAAGAACTCC 
" 151 AGTGTCGGCC CTCTGTACTC TGGCTGCAGA CTGATCTCTC TCAGGTCTGA 

20 1 GAAGGATGGG GCAGCCACTG GAGTGGATGC CATCTGCACC CACCACCTTA 
251 ACCCTCAAAG CCCTGGACTG GACAGGGAGC AGCTGTACTG GCAGCTGAGC 
30! CAGATGACCA ATGGCATCAA AGAGCTGGGC CCCTACACCC TGGACCGGAA 
351 CAGTCTCTAC GTCAATGGTT TCACCCATCG GAGCTCTGGG CTCACCACCA 
^ 401 GCACTCCTTG GACTTCCACA GTTGACCTTG GAACCTCAGG GACTCCATCC 

451 CCCGTCCCCA GCCCCACA 



2 h 



3iP. 

rn 

! v 

! 


(SEQ ID NO 
1 


i: 97) 

ACTGCTGGCC 


CTCTCCTGGT 


GCCATTCACC 


CTAAACTTCA 


CCATCACCAA 




51 


CCTGCAGTAT 


GAGGAGGACA 


TGCATCGCCC 


TGGATCTAGG 


AAGTTCAACG 


t 

35 


101 


CCACAGAGAG 


GGTCCTGCAG 


GGTCTGCTTA 


GTCCCATATT 


CAAGAACTCC 




151 


AGTGTTGGCC 


CTCTGTACTC 


TGGCTGCAGA 


CTGACCTCTC 


TCAGGCCCGA 




201 


GAAGGATGGG 


GCAGCAACTG 


GAATGGATGC 


TGTCTGCCTC 


TACCACCCTA 


40 


251 


ATCCCAAAAG 


ACCTGGACTG 


GACAGAGAGC 


AGCTGTACTG 


GGAGCTAAGC 




301 


CAGCTGACCC 


ACAACATCAC 


TGAGCTGGGC 


CCCTACAGCC 


TGGACAGGGA 


45 


351 


CAGTCTCTAT 


GTCAATGGTT 


TCACCCATCA 


GAGCTCTATG 


ACGACCACCA 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



401 GAACTCCTGA TACCTCCACA ATGCACCTGG CAACCTCGAG AACTCCAGCC 
451 TCCCTGTCTG GACCTACG 

(SEQ ^^ccGCCAGCC CTCTCCTGGT GCTATTCACA ATCAACTGCA CCATCACCAA 

51 CCTGCAGTAC GAGGAGGACA TGCGTCGCAC TGGCTCCAGG AAGTTCAACA 

101 CCATGGAGAG TGTCCTGCAG GGTCTGCTCA AGCCCTTGTT CAAGAACACC 

151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA TTGACCTTGC TCAGGCCCAA 

201 GAAAGATGGG GCAGCCACTG GAGTGGATGC CATCTGCACC CACCGCCTTG 

251 ACCCCAAAAG CCCTGGACTC AACAGGGAGC AGCTGTACTG GGAGCTAAGC 

301 AAACTGACCA ATGACATTGA AGAGCTGGGC CCCTACACCC TGGACAGGAA 

3 51 CAGTCTCTAT GTCAATGGTT TCACCCATCA GAGCTCTGTG TCCACCACCA 

4 01 GCACTCCTGG GACCTCCACA GTGGATCTCA GAACCTCAGG GACTCCATCC 
4 51 TCCCTCTCCA GCCCCACAAT TATG 

r<5TTO TD NO* 99) 

1 NCNNCTGNCC CTCTCCTGNT NCCNTTCACC NTCAACTTNA CCATCACCAA 

51 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 

101 CCACNGAGAG GGTCCTACAG GGTCTGCTCA GGCCCTTGTT CAAGAACACC 

151 AGTGTCAGCT CTCTGTACTC TGGTTGCAGA CTGACCTTGC TCAGGCCTGA 

201 GAAGGATGGG GCAGCCACCA GAGTGGATGC TGCCTGCACC TACCGCCCTG 

251 ATCCCAAAAG CCCTGGACTG GACAGAGAGC AACTATACTG GGAGCTGAGC 

301 CAGCTAACCC ACAGCATCAC TGAGCTGGGA CCCTACACCC TGGACAGGGT 
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TABLE 15 -continued 

CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru 145) 



10 



15 



351 CAGTCTCTAT 
401 GCACTCCTGG 
4 51 TCCCTGCCTG 

(SEQ ID NO: 100) 

1 GCCCCTGTCC 



GTCAATGGCT TCAACCCTCG GAGCTCTGTG CCAACCACCA 
GACCTCCACA GTGCACCTGG CAACCTCTGG GACTCCATCC 
GCCACACA 



in 



251 



Pi 

m 



35 



40 



51 CCTGCATTAT 

101 CCACGGAGAG 

151 AGCGTTGGCC 

2 01 GAAACATGGG 
251 ATCCCACTGG 

3 01 CAGCTGACCA 
351 CAGTCTCTAT 

4 01 GTATTCCTGG 
451 TCCCTCCCTG 

(SEQ ID NO: 101) 

1 GCCCCTGGCC 



CTCTCTTGAT 
GAAGAAAACA 
GGTTCTGCAG 
CTCTGTACTC 
GCAGCCACTG 
TCCTGGACTG 
ACAGCGTTAC 
GTCAATGGCT 
GACCTCTGCA 
GCCACACA 



ACCATTCACC 

TGCAACACCC 

GGTCTGCTCA 

TGGCTGCAGA 

GAGTGGACGC 

GACAGAGAGC 

AGAGCTGGGC 

TCACCCAGCG 

GTGCACCTGG 



CTCAACTTTA 

TGGTTCCAGG 

AGCCCTTGTT 

CTGACCTTGC 

CATCTGCACC 

GGCTATACTG 

CCCTACACCC 

GAGCTCTGTG 

AAACCTCTGG 



CCATCACCAA 

AAGTTCAACA 

CAAGAGCACC 

TCAGACCTGA 

CTCCGCCTTG 

GGAGCTGAGC 

TGGACAGGGA 

CCAACCACCA 

GACTCCAGCC 



45 



51 CCTGCAGTAT 

101 CCACGGAGAG 

151 AGTGTTGGCC 

201 AAAACGTGGG 

251 ACCCTCTAAA 



CTCTCCTGGT 

GAGGTGGACA 

AGTCCTGCAG 

CTCTGTACTC 

GCAGCCACCG 

CCCTGGACTG 



GCCATTCACC 

TGCGTCACCC 

GGTCTGCTCA 

TGGCTGCAGA 

GCGTGGACAC 

GACAGAGAGC 



CTCAACTTCA 
TGGTTCCAGG 
AGCCCTTGTT 
CTGACCTTGC 
CATCTGCACT 
AGCTATACTG 



CTATCACCAA 
AAGTTCAACA 
CAAGAGCACC 
TCAGGCCTGA 
CACCGCCTTG 
GGAGCTGAGC 



78 




301 AAACTGACCC GTGGCATCAT CGAGCTGGGC CCCTACCTCC TGGACAGAGG 
35! CAGTCTCTAT GTCAATGGTT TCACCCATCG GAACTTTGTG CCCATGACCA 
401 GCACTCCTGG GACCTCCACA GTACACCTAG GAACCTCTGA AACTCCATCC 
451 TCCCTACCTA GACCCATA 
(SEQ ^TGCCTGGCC CTCTCCTGGT GCCATTCACC CTCAACTTCA CCATCACCAA 
51 CTTGCAGTAT GAGGAGGCCA TGCGACACCG TGGCTCCAGG AAGTTCAATA 
2 S 101 CGACGGAGAG GGTCCTACAG GGTCTGCTCA GGCCCTTGTT CAAGAATACC 

151 AGTATCGGCC CTCTGTACTC CAGCTGCAGA CTGACCTTGC TCAGGGCAGA 
20 1 GAAGGACAAG GCAGCCAGCA GAGTGGATGC CATCTGTACC CACCACCCTG 
251 ACGCTCAAAG CCCTGGACTG AACAGAGAGC AGCTGTACTG GGAGCTGAGC 
k 301 CAGCTGACCC AGGGCATCAG TGAGCTGGGC CCCTACACCC TGGACAGGGA 

* 351 CAGTCTCTAT GTCGATGGTT TCACTCATTG GAGCCCCATA CCGACCACCA 

12 401 GCACTCCTGG GACCTCCATA GTGAACCTGG GAACCTCTGG GATCCCACCT 

35 4 51 TCCCTCCCTG AAACTACA 

(SEQ ^r^CTGNCC CTCTCCTGNT NCCNTTCACC NTCAACTTNA CCATCACCAA 
51 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 
101 CCACNGAGAG GGTTCTGCAG GGTCTGCTCA AACCCTTGTT CAGGAATAGC 
1S1 AGTCTGGAAT ACCTCTATTC AGGCTGCAGA CTAGCCTCAC TCAGGGCAGA 
20 1 GAAGGATAGC TCAGCCATGG CAGTGGATGC CATCTGCACA CATCGCCCTG 



m 



ED 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru 145) 



251 ACCCTGAAGA 

301 AATCTGACAA 

351 CAGTCTCTAC 

401 GCACTCCTTG 

451 CCCGTCCCCA 

(SEQ ID NO: 104) 

1 ACTGCTGGCC 



CCTCGGACTG GACAGAGAGC GACTGTACTG GGAGCTGAGC 
ATGGCATCCA GGAGCTGGGC CCCTACACCC TGGACCGGAA 
GTCAATGGTT TCACCCATCG GAGCTCTGGG CTCACCACCA 
GACTTCCACA GTTGACCTTG GAACCTCAGG GACTCCATCC 
GCCCCACA 



51 CCTGCAGTAT 

101 CCACGGAGAG 

151 AGTGTTGGCC 

201 GAAGCAAGAG 

251 ATCCCATCGG 

301 CAGCTGACCA 

351 CAGTCTCTAT 

401 GCACTCCTGG 

451 TCCCTGCCTG 



CTCTCCTGGT 
GAGGAGGACA 
GGTTCTGCAG 
CTCTGTACTC 
GCAGCCACTG 
ACCTGGACTG 
ACAGCATCAC 
GTCAATGGCT 
GACCTCCACA 
GCCACACA 



GCCATTCACC 
TGCATCGCCC 
GGTCTGCTCA 
TGGCTGCAGA 
GAGTGGACAC 
GACAGAGAGC 
AGAGCTGGGA 
TCAACCCTTG 
GTGCACCTGG 



CTCAACTTCA 
TGGTTCCAGG 
CGCCCTTGTT 
CTGACCTTGC 
CATCTGTACC 
GGCTATACTG 
CCCTACACCC 
GAGCTCTGTG 
CAACCTCTGG 



CCATCACCAA 

AGGTTCAACA 

CAAGAACACC 

TCAGACCTGA 

CACCGCGTTG 

GGAGCTGAGC 

TGGATAGGGA 

CCAACCACCA 

GACTCCATCC 



45 



(SEQ ID NO: 105) 

1 GCCCCTGTCC 

51 CCTGCATTAT 
101 CCACGGAGAG 
151 AGCGTTGGCC 



CTCTCTTGAT ACCATTCACC CTCAACTTTA CCATCACCGA 
GAAGAAAACA TGCAACACCC TGGTTCCAGG AAGTTCAACA 
GGTTCTGCAG GGTCTGCTCA AGCCCTTGTT CAAGAGCACC 
CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGACCTGA 
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TABLE 15-continued 



c " c CTCCGCCTTG 

[0 251 ATCCCACTGG TCCTGGACTG GACAGAGAGC GGCTATACTG GGAGCTGAGC 

' 0 3 01 CAGCTGACCA ACAGCGTTAC AGAGCTGGGC CCCTACACCC TGGACAGGGA 

351 CGTCTCTAT GTCAATGGCT T CACCCA T CG GAGCTCTGTG CCAACCACCA 
» 40l GTATTCCTGG GACCTCTGCA GTGCACCTGG AAACCTCTGG GACTCCAGCC 

451 TCCCTCCCTG GCCACACA 
m (SE0 X Di HO = c 106) ggcc ctctcctggt QCCMTCACC CTCAACTTCA CTATCACCAA 
CCTGCAGTAT GAGGAGGACA TGCGTCACCC TGGTTCCAGG AAGTTCAGCA 
l0l CCACGGAGAG AGTCCTGCAG GGTCTGCTCA AGCCCTTGTT CAAOAACACC 
l51 AGTGTCAGCT CTCTGTACTC TGGTTGCAGA CTGACCTTGC TCAGGCCTGA 
aol GAAGGATGGG GCAGCCACCA GAGXGGAXGC TGTCTGCACC CATCGTCCTG 
asl ACCCCAAAAG CCCXGGACTG GACAGAGAGC GGCTGTACTG GAAGCXGAGC 
, 0l CAGCTGACCC ACGGCATCAC TGAGCTGGGC CCCTACACCC TGGACAGGCA 
JBl CAGTCTCTAT GTCAATGGTT TCACCCATCA GAGCTCTATG ACGACCACCA 
401 GAACTCCTGA TACCTCCACA ATGCACCTGG CAACCTCGAG AACTCCAGCC 
451 TCCCTGTCTG GACCTACG 
(MQ » »^>. CTCTCCTGGT GCTATTCACA ATTAACTTCA CCATCACTAA 
51 CCTGCGGTAT GAGGAGAACA TGCATCACCC TGGCTCTAGA AAGTTTAACA 
10i CCACGGAGAG AGTCCTTCAG GGTCTGCTCA GGCCTGTGTT CAAGAACACC 



Llj 
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TABLE 15 -continued 



— CTGACCACGC -CCCM 

201 GfiA GGATGGG — AAGTGGATGC CATCTGCACC 

251 MC CCAAAAG CCCTGGACTG GACAGAGAGC AGCTATACTG GGAGCTGAGC 

301 CAGCTAACCC ACAGCATCAC TGAGCTGGGC CCCTACACCC AGGACAGGGA 

, sl CAGTCTCTAT GTCAATCGCT tcacccatcg GAGCTCTGTG CCAACCACCA 

401 GTATTCCTGG GACCTCTGCA GTGCACCTGG AAACCTCTGG GACTCCAGCC 

451 TCCCTCCCTG GCCACACA 

(SEQ ^VcCcUcC CTCTCCTGGT GCCATTCACC CTCAACTTCA CTATCACCAA 

TPCGTCACCC TGGTTCCAGG AAGTTCAACA 
51 CCTGCAGTAT GAGGAGGACA TGCGTCALLL 

101 CCACGGAGAG AGTCCTGCAG GGTCTGCTCA AGCCCTTGTT CAAGAGCACC 
151 A «C CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGGCCTGA 
201 _ CCT GGG GCAGCCACCG GCGTGGACAC CATCTGCACT CACCGCCTTC 
251 ^CCCTCTAAA CCCAGGACTG GACAGAGAGC AGCTATACTG GGAGCTGAGC 
301 ^CTGACCC GTGGCATCAT CGAGCTGGGC CCCTACCTCC TGGACAGAGG 
, sl CAGTCTCTAT GTCAATGGTT TCACCCATCG GACCTCTGTG CCCACCACCA 
401 OCACTCCTGG GACCTCCACA GTGGACCTTG GAACCTCAGG GACTCCATTC 
451 TCCCTCCCAA GCCCCGCA 
(SM ^r^^CTGNCC CTCTCCTGNT XCCKTTCACC NTCAACTTNA CCATCACCAA 
51 CCTGCANTAN GNGGANNACA TGCHHCKCCC NGGNTCCAGG AAGTTCAACA 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



10 1 CCACNGAGAG GGTCCTGCAG ACTCTGCTTG GTCCTATGTT CAAGAACACC 
151 AGTGTTGGCC TTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGGTCCGA 
201 GAAGGATGGA GCAGCCACTG GAGTGGATGC CATCTGCACC CACCGTCTTG 
251 ACCCCAAAAG CCCTGGAGTG GACAGGGAGC AACTATACTG GGAGCTGAGC 
15 301 CAGCTGACCA ATGGCATTAA AGAACTGGGC CCCTACACCC TGGACAGGAA 

351 CAGTCTCTAT GTCAATGGGT TCACCCATTG GATCCCTGTG CCCACCAGCA 
40 1 GCACTCCTGG GACCTCCACA GTGGACCTTG GGTCAGGGAC TCCATCCTCC 
& 451 CTCCCCAGCC CCACA 

(SEQ "^ACTGCTGGCC CTCTCCTGGT GCCGTTCACC CTCAACTTCA CCATCACCAA 
51 CCTGAAGTAC GAGGAGGACA TGCATTGCCC TGGCTCCAGG AAGTTCAACA 
10 1 CCACAGAGAG AGTCCTGCAG AGTCTGCTTG GTCCCATGTT CAAGAACACC 
t 151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGGTCCGA 

□ 201 GAAGGATGGA GCAGCCACTG GAGTGGATGC CATCTGCACC CACCGTCTTG 

35 251 ACCCCAAAAG CCCTGGAGTG GACAGGGAGC AGCTATACTG GGAGCTGAGC 

301 CAGCTGACCA ATGGCATCAA AGAGCTGGGT CCCTACACCC TGGACAGAAA 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCA GACCTCTGCG CCCAACACCA 
40 401 GCACTCCTGG GACCTCCACA GTGGACCTTG GGACCTCAGG GACTCCATCC 

451 TCCCTCCCCA GCCCTACA 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



45 



(SEQ ^nc^cTGNCC CTCTCCTGNT NCCNTTCACC NTCAACTTNA CCATCACCAA 

51 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 

101 CCACNGAGNG NGTNCTGCAG GGTCTGCTNN NNCCCNTNTT CAAGAACNCC 

151 AGTGTNGGCC NTCTGTACTC TGGCTGCAGA CTGACCTNNC TCAGGNCNGA 

201 GAAGNATGGN GCAGCCACTG GANTGGATGC CATCTGCANC CACCNNCNTN 

251 ANCCCAAAAG NCCTGGACTG NACAGNGAGC NGCTNTACTG GGAGCTNAGC 

301 CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC TGGACAGGNA 

351 CAGTCTCTAT GTCAATGGTT TCACCCATTG GATCCCTGTG CCCACCAGCA 

401 GCACTCCTGG GACCTCCACA GTGGACCTTG GGTCAGGGAC TCCATCCTCC 

451 CTCCCCAGCC CCACA 

(SEQ I Y°; c tgc TGGCC CTCTCCTGGT GCCGTTCACC CTCAACTTCA CCATCACCAA 
51 CCTGAAGTAC GAGGAGGACA TGCATTGCCC TGGCTCCAGG AAGTTCAACA 
101 CCACAGAGAG AGTCCTGCAG AGTCTGCTTG GTCCCATGTT CAAGAACACC 
151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTCGC TCAGGTCCGA 
201 GAAGGATGGA GCAGCCACTG GAGTGGATGC CATCTGCACC CACCGTGTTG 
251 ACCCCAAAAG CCCTGGAGTG GACAGGGAGC AGCTATACTG GGAGCTGAGC 
301 CAGCTGACCA ATGGCATCAA AGAGCTGGGT CCCTACACCC TGGACAGAAA 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCA GACCTCTGCG CCCAACACCA 
401 GCACTCCTGG GACCTCCACA GTGNACNTNG GNACCTCNGG GACTCCATCC 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO : 145) 



451 TCCNTCCCCN GCCNCACA 
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(SEQ ID NO: 113) 

1 TCTGCTGGCC 
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51 CCTGCAGTAC 

101 CCACGGAGCG 

151 AGTGTCGGCC 

201 GAAGAATGGG 

251 ACCCCAAAAG 

301 CANCTGACCA 

351 CAGTCTCTAT 

401 GCACTCCTGG 
451 TCCNTCCCCN 



CTCTCCTGGT 
GAGGAGGACA 
GGTCCTGCAG 
TTCTGTACTC 
GCAACCACTG 
CCCTGGACTG 
ANNNCATCNN 
GTCAATGGTT 
GACCTCCACA 
GCCNCACA 



GCCATTCACC 
TGCATCACCC 
GGTCTGCTTG 
TGGCTGCAGA 
GAATGGATGC 
NACAGNGAGC 
NGAGCTGGGN 
TCACCCATCN 
GTGNACNTNG 



CTCAACTTCA 
AGGCTCCAGG 
GTCCCATGTT 
CTGACCTTGC 
CATCTGCACC 
NGCTNTACTG 
CCCTACACCC 
GANCTCTGNG 
GNACCTCNGG 



CCATCACCAA 
AAGTTCAACA 
CAAGAACACC 
TCAGGCCTGA 
CACCGTCTTG 
GGAGCTNAGC 
TGGACAGGNA 
CCCACCACCA 
GACTCCATCC 
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(SEQ ID NO: 114) 

1 NCNNCTGNCC 

51 CCTGCANTAN 

101 CCACNGAGAG 

151 AGTCTGGAAT 

201 GAAGGATAGC 

251 ACCCTGAAGA 

3 01 AATCTGACAA 

351 CAGTCTCTAT 



CTCTCCTGNT 
GNGGANNACA 
GGTTCTGCAG 
ACCTCTATTC 
TCAGCCATGG 
CCTCGGACTG 
ATGGCATCCA 
GTCAATGGTT 



NCCNTTCACC 
TGCNNCNCCC 
GGTCTGCTCA 
AGGCTGCAGA 
CAGTGGATGC 
GACAGAGAGC 
GGAGCTGGGC 
TCACCCATCG 



NTCAACTTNA 
NGGNTCCAGG 
AACCCTTGTT 
CTAGCCTCAC 
CATCTGCACA 
GACTGTACTG 
CCCTACACCC 
AAGCTCTATG 



CCATCACCAA 
AAGTTCAACA 
CAGGAATAGC 
TCAGGCCAGA 
CATCGCCCTG 
GGAGCTGAGC 
TGGACCGGAA 
CCCACCACCA 
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TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



401 GCACTCCTGG GACCTCCACA GTGGATGTGG GAACCTCAGG GACTCCATCC 
451 TCCAGCCCCA GCCCCACG 

(SEQ VlcTGCTGGCC CTCTCCTGAT ACCATTCACC CTCAACTTCA CCATCACCAA 
51 CCTGCAGTAT GGGGAGGACA TGGGTCACCC TGGCTCCAGG AAGTTCAACA 

101 CCACAGAGAG GGTCCTGCAG GGTCTGCTTG GTCCCATATT CAAGAACACC 

151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTCTC TCAGGTCTGA 

201 GAAGGATGGA GCAGCCACTG GAGTGGATGC CATCTGCATC CATCATCTTG 

251 ACCCCAAAAG CCCTGGACTC AACAGAGAGC GGCTGTACTG GGAGCTGAGC 

301 CAACTGACCA ATGGCATCAA AGAGCTGGGC CCCTACACCC TGGACAGGAA 

351 CAGTCTCTAT GTCAATGGTT TCACCCATCG GACCTCTGTG CCCACCACCA 

401 GCACTCCTGG GACCTCCACA GTGGACCTTG GAACCTCAGG GACTCCATTC 

4 51 TCCCTCCCAA GCCCCGCA 



35 


(SEQ ID NC 

1 


: 116) 

ACTGCTGGCC 


CTCTCCTGGT 


GCTGTTCACC 


CTCAACTTCA 


CCATCACCAA 




51 


CCTGAAGTAT 


GAGGAGGACA 


TGCATCGCCC 


TGGCTCCAGG 


AAGTTCAACA 




101 


CCACTGAGAG 


GGTCCTGCAG 


ACTCTGCTTG 


GTCCTATGTT 


CAAGAACACC 


40 


151 


AGTGTTGGCC 


TTCTGTACTC 


TGGCTGCAGA 


CTGACCTTGC 


TCAGGTCCGA 




201 


GAAGGATGGA 


GCAGCCACTG 


GAGTGGATGC 


CATCTGCACC 


CACCGTCTTG 


45 


251 


ACCCCAAAAG 


CCCTGGACTG 


NACAGNGAGC 


NGCTNTACTG 


GGAGCTNAGC 
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TABLE 15 -continued 



301 CAKCTGACCA ANNNCATCW NGAGCTGGGN 
351 CAGTCTCTAT GTCAATGGTT TCACCCATC 
401 GCACTCCTGG GACCTCCACA GTGNACNTNG 
451 TCCNTCCCCN GCOJCACA 
SEQ ID ri,™CTGNCC CTCTCCTGNT NCCNTTCACC 



CCCTACACCC TGGACAGGNA 
GANCTCTGNG CCCACCACCA 
GHACCTCNGG GACTCCATCC 



51 CCTGCANTAN GNGGANNACA TGOMCNCCC 
m CCACNGAGAG AGTCCTTCAG GGTCTGCTCA 
1S1 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA 
201 GAAGGATGGG GCAGCCACCA AAGTGGATGC 
251 ATCCCAAAAG CCCTGGACTG GACAGAGAGC 
301 CAGCTAACCC ACAGCATCAC TGAGCTGGGC 
351 CAGTCTCTAT GTCAATGGCT TCACCCATCG 
401 GTATTCCTGG GACCTCTGCA GTGCACCTGG 
451 TCCTTCCCCG GCCACACA 

(SM "rGAGCCTGGCC CTCTCCTGAT ACCATTCACT 
51 CCTGCGTTAT GAGGAAAACA TGCAACACCC 
101 CCACGGAGAG GGTTCTGCAG GGTCTGCTCA 
151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA 
201 GAAGCAGGAG GCAGCCACTG GAGTGGACAC 



NTCAACTTNA 
NGGNTCCAGG 
GGCCTGTGTT 
CTGACCTTGC 
CATCTGCACC 
AGCTATACTG 
CCCTACACCC 
GAGCTCTGTG 
AAACCACTGG 



CCATCACCAA 
AAGTTCAACA 
CAAGAACACC 
TCAGGCCCAA 
TACCGCCCTG 
GGAGCTGAGC 
AGGACAGGGA 
CCAACCACCA 
GACTCCATCC 



TTCAACTTTA CCATCACCAA 
TGGTTCCAGG AAGTTCAACA 
CGCCCTTGTT CAAGAACACC 
CTGACCTTGC TCAGACCTGA 
CATCTGTACC CACCGCGTTG 
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TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru 145) 



251 ATCCCATCGG ACCTGGACTG GACAGAGAGC GGCTATACTG GGAGCTGAGC 

301 CAGCTGACCA ACAGCATCAC AGAGCTGGGA CCCTACACCC TGGATAGGGA 

351 CAGTCTCTAT GTCGATGGCT TCAACCCTTG GAGCTCTGTG CCAACCACCA 

401 GCACTCCTGG GACCTCCACA GTGCACCTGG CAACCTCTGG GACTCCATCC 

451 CCCCTGCCTG GCCACACA 

(SEQ ^GCCCCTGTCC CTCTCTTGAT ACCATTCACC CTCAACTTTA CCATCACCGA 

51 CCTGCATTAT GAAGAAAACA TGCAACACCC TGGTTCCAGG AAGTTCAACA 

101 CCACGGAGAG GGTTCTGCAG GGTCTGCTCA AGCCCTTGTT CAAGAGCACC 

151 AGCGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGACCTGA 

201 GAAACATGGG GCAGCCACTG GAGTGGACGC CATCTGCACC CTCCGCCTTG 

° 251 ATCCCACTGG TCCTGGACTG GACAGAGAGC GGCTATACTG GGAGCTGAGC 

301 CAGCTGACCA ACAGCATCAC AGAGCTGGGA CCCTACACCC TGGATAGGGA 

351 CAGTCTCTAT GTCAATGGCT TCAACCCTTG GAGCTCTGTG CCAACCACCA 

401 GCACTCCTGG GACCTCCACA GTGCACCTGG CAACCTCTGG GACTCCATCC 

4 51 TCCCTGCCTG GCCACACA 

(SEQ "Vactgctggcc ctctcctggt gccgttcacc ctcaacttca ccatcaccaa 

51 CCTGAAGTAC GAGGAGGACA TGCATTGCCC TGGCTCCAGG AAGTTCAACA 
101 CCACAGAGAG AGTCCTGCAG AGTCTGCATG GTCCCATGTT CAAGAACACC 
151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGGTCCGA 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



201 GAAGGATGGA GCAGCCACTG 

2 51 ACCCCAAAAG CCCTGGACTG 

3 01 CANCTGACCA ANNNCATCNN 

3 51 CAGTCTCTAT GTCAATGGTT 
401 GCACTCCTGG GACCTCCACA 

4 51 TCCNTCCCCN GCCNCACA 

(SEQ ID NO: 121) 

1 NCNNCTGNCC CTCTCCTGNT 

51 CCTGCANTAN GNGGANNACA 

101 CCACNGAGNG NGTNCTGCAG 

151 AGTGTNGGCC NTCTGTACTC 

201 GAAGNATGGN GCAGCCACTG 

251 ANCC CAAAAG NCCTGGACTG 

3 01 CANCTGACCA ACAGCATCAC 
351 CAGTCTCTAT GTCAATGGTT 
401 GTATTCCTGG GACCTCTGCA 

4 51 TCCCTCCCTG GCCACACA 



GAGTGGATGC CATCTGCACC CACCGTCTTG 
NACAGNGAGC NGCTNTACTG GGAGCTNAGC 
NGAGCTGGGN CCCTACACCC TGGACAGGNA 
TCACCCATCN GANCTCTGNG CCCACCACCA 
GTGNACNTNG GNACCTCNGG GACTCCATCC 



NCCNTTCACC 
TGCNNCNCCC 
GGTCTGCTNN 
TGGCTGCAGA 
GANTGGATGC 
NACAGNGAGC 
AGAGCTGGGA 
TCACCCATCG 
GTGCACCTGG 



NTCAACTTNA 
NGGNTCCAGG 
NNCCCNTNTT 
CTGACCTNNC 
CATCTGCANC 
NGCTNTACTG 
CCCTACACCC 
AAGCTCTATG 
AAACCTCTGG 



CCATCACCAA 

AAGTTCAACA 

CAAGAACNCC 

TCAGGNCNGA 

CACCNNCNTN 

GGAGCTNAGC 

TGGATAGGGA 

CCCACCACCA 

GACTCCAGCC 
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(SEQ ID NO: 122) 

1 GCCCCTGGCC CTCTCCTGGT 

51 CCTGCAGTAT GAGGAGGACA 
101 CCACGGAGAG AGTCCTGCAG 



GCCATTCACC CTCAACTTCA CTATCACCAA 
TGCGTCACCC TGGTTCCAGG AAGTTCAACA 
GGTCTGCTCA AGCCCTTGTT CAAGAGCACC 
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TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 





151 


AGTGTTGGCC 


CTCTGTACTC 


TGGCTGCAGA 




TCAGGCCTGA 


10 


201 


AAAACGTGGG 


GCAGLLACLU 




CATCTGCACT 


CACCGCCTTG 




251 


ACCCTCTAAA 


CCCTGGACTG 


NACAGNGAGC 


NGCTNTACTG 






301 


CANCTGACCA 


ANNNCATCNN 


NGAGCTGGGN 


CCCTACACCC 


TGGACAGGNA 


15 


351 


CAGTCTCTAT 


GTCAATGGTT 


TCACCCATCN 


GANCTCTGNG 


CCCACCACCA 




401 


GCACTCCTGG 


GACCTCCACA 


GTGNACNTNG 


GNACCTCNGG 


GACTCCATCC 


20. 


451 


TCCNTCCCCN 


GCCNCACA 









| (SEQ ID i N0 ^ c ^^ GNCC cpCTCCTGNT NCCNTTCACC NTCAACTTNA CCATCACCAA 

2Si 51 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 

101 CCACNGAGNG NGTNCTGCAG GGTCTGCTNN NNCCCNTNTT CAAGAACNCC 

151 AGTGTNGGCC NTCTGTACTC TGGCTGCAGA CTGACCTNNC TCAGGNCNGA 

2 01 GAAGNATGGN GCAGCCACTG GANTGGATGC CATCTGCANC CACCNNCNTN 
□ 2 51 ANCCCAAAAG NCCTGGACTG NACAGNGAGC NGCTNTACTG GGAGCTNAGC 

f » 

3 01 CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC TGGACAGGNA 
351 CAGTCTCTAT GTCAATGGTT TTCACCCTCG GAGCTCTGTG CCAACCACCA 

4 01 GCACTCCTGG GACCTCCACA GTGCACCTGG CAACCTCTGG GACTCCATCC 
4 51 TCCCTGCCTG GCCACACA 

(SEQ ID i N °^ c ^^ GTCC CTCTCTTGAT ACCATTCACC CTCAACTTTA CCATCACCAA 

51 CCTGCATTAT GAAGAAAACA TGCAACACCC TGGTTCCAGG AAGTTCAACA 
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CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



101 


CCACGGAGCG 


GGTCCTGCAG 


GGTCTGCTTG 


GTCCCATGTT 


CAAGAACACA 


151 


AGTGTCGGCC 


TTCTGTACTC 


TGGCTGCAGA 


CTGACCTTGC 


TCAGGCCTGA 


201 


GAAGAATGGG 


GCAGCCACTG 


GAATGGATGC 


CATCTGCAGC 




251 


ACCCCAAAAG 


CCCTGGACTG 


NACAGNGAGC 


NGCTNTACTG 


GGAGCTNAGC 


301 


CANCTGACCA 


ANNNCATCNN 


NGAGCTGGGN 


CCCTACACCC 


TGGACAGGNA 


351 


CAGTCTCTAT 


GTCAATGGTT 


TCACCCATCN 


GANCTCTGNG 


CCCACCACCA 


401 


GCACTCCTGG 


GACCTCCACA 


GTGNACNTNG 


GNACCTCNGG 


GACTCCATCC 


451 


TCCNTCCCCN 


GCCNCACA 









<SEQ CTCTCCTGNT NCCNTTCACC NTCAACTTNA CCATCACCAA 

51 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 
1 101 CCACNGAGNG NG™CTGCAG GGTCTGCTNN NNCCCNTNTT CAAGAACNCC 

I 151 AGTGTNGGCC NTCTGTACTC TGGCTGCAGA CTGACCTNNC TCAGGNCNGA 

2 01 GAAGNATGGN GCAGCCACTG GANTGGATGC CATCTGCANC CACCNNCNTN 
2 51 ANCCCAAAAG NCCTGGACTG NACAGNGAGC NGCTNTACTG GGAGCTNAGC 
301 CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC TGGACAGGNA 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCA GAACTCTGTG CCCACCACCA 
4 01 GTACTCCTGG GACCTCCACA GTGTACTGGG CAACCACTGG GACTCCATCC 
451 TCCTTCCCCG GCCACACA 
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(SEQ ™ ^GCCTGGCC CTCTCCTGAT ACCATTCACT TTCAACTTTA CCATCACCAA 
S1 CCTGCATTAT GAGGAAAACA TGCAACACCC TGGTTCCAGG AAGTTCAACA 
101 CCACGGAGAG GGTTCTGCAG GGTCTGCTCA CGCCCTTGTT CAAGAACACC 
151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGACCTGA 
201 GAAGCAGGAG GCAGCCACTG GAGTGGACAC CATCTGTACC CACCGCGTTG 
251 ATCCCATCGG ACCTGGACTG NACAGNGAGC NGCTNTACTG GGAGCTNAGC 
2 h 3d CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC TGGACAGGNA 

11 35 1 CAGTCTCTAT GTCAATGGTT TCACCCATCN GANCTCTGNG CGCACCACCA 

I 401 GCACXCCTGG GACCTCCACA GTGNACNTNG GNACCTCNGG GACTCCATCC 

4 51 TCCNTCCCCN GCCNCACA 
(SM ^NCNNCTGNCC CTCTCCTGNT NCCNTTCACC NTCAACTTNA CCATCACCAA 
S1 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 
l0 l CCACNGAGNG NGTNCTGCAG GGTCTGCTNN NNCCCNTNTT CAAGAACNCC 
151 AGTGTNGGCC CTCTGTACTC TGGCTGCAGA CTGACCTNNC TCAGGNCNGA 
201 GAAGNATGGN GCAGCCACTG GANTGGATGC CATCTGCANC CACCNNCNTN 
251 ANCCCAAAAG NCCTGGACTG NACAGNGAGC NGCTNTACTG GGAGCTNAGC 
301 CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC TGGACAGGNA 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCG GAGCTCTGTG CCAACCACCA 
401 GCAGTCCTGG GACCTCCACA GTGCACCTGG CAACCTCTGG GACTCCATCC 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



4 51 TCCCTGCCTG GCCACACA 



10 



(SEQ ID NO: 128) 

1 GCCCCTGTCC 
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51 CCTGCATTAT 

101 CCACGGAGAG 

151 AGTGTTGGCC 

201 GAAACATGGG 

251 ATCCCACTGG 

301 CANCTGACCA 

351 CAGTCTCTAT 
401 GCACTCCTGG 
451 TCCNTCCCCN 



CTCTCTTGAT 
GAAGAAAACA 
GGTTCTGCAG 
CTCTGTACTC 
GCAGCCACTG 
TCCTGGACTG 
ANNNCATCNN 
GTCAATGGTT 
GACCTCCACA 
GCCNCACA 



ACCATTCACC 
TGCAACACCC 
GGTCTGCTCA 
TGGCTGCAGA 
GAGTGGACGC 
NACAGNGAGC 
NGAGCTGGGN 
TCACCCATCN 
GTGNACNTNG 



CTCAACTTTA 
TGGTTCCAGG 
AGCCCTTGTT 
CTGACCTTGC 
CATCTGCACC 
NGCTNTACTG 
CCCTACACCC 
GANCTCTGNG 
GNACCTCNGG 



CCATCACCAA 
AAGTTCAACA 
CAAGAGCACC 
TCAGACCTGA 
CTCCGCCTTG 
GGAGCTNAGC 
TGGACAGGNA 
CCCACCACCA 
GACTCCATCC 



(SEQ ID NO: 129) 

1 NCNNCTGNCC 
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51 CCTGCANTAN 

101 CCACNGAGNG 

151 AGTGTNGGCC 

201 GAAGNATGGN 

251 ANCCCAAAAG 

301 CANCTGACCA 

351 CAGTCTCTAT 



CTCTCCTGNT 
GNGGANNACA 
NGTNCTGCAG 
NTCTGTACTC 
GCAGCCACTG 
NCCTGGACTG 
ANNNCATCNN 
GTCAATGGTT 



NCCNTTCACC 
TGCNNCNCCC 
GGTCTGCTNN 
TGGCTGCAGA 
GANTGGATGC 
NACAGNGAGC 
NGAGCTGGGN 
TCACCCATCG 



NTCAACTTNA 
NGGNTCCAGG 
NNCCCNTNTT 
CTGACCTNNC 
CATCTGCANC 
NGCTNTACTG 
CCCTACACCC 
GACCTCTGTG 



CCATCACCAA 
AAGTTCAACA 
CAAGAACNCC 
TCAGGNCNGA 
CACCNNCNTN 
GGAGCTNAGC 
TGGACAGGNA 
CCCACCACCA 
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401 GCACTCCTGG GACCTCCACA GTGCACCTGG 
4 51 TCCCTGCCTG GCCACACA 
(SEQ I^NO^ISO)^^ ctctcttgat ACCATTCACC 
51 CCTGCAGTAT GAGGAGGACA TGCATCGCCC 
101 CCACAGAGAG GGTCCTGCAG GGTCTGCTTA 
151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA 
201 GAAGGATGGG GCAGCAACTG GAATGGATGC 
251 ATCCCAAAAG ACCTGGACTG NACAGNGAGC 
301 CANCTGACCA ANNNCATCNN NGAGCTGGGN 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCN 
401 GCACTCCTGG GACCTCCACA GTGNACNTNG 
4 51 TCCNTCCCCN GCCNCACA 



CAACCTCTGG GACTCCATCC 
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CTCAACTTTA 
TGGATCTAGG 
GTCCCATTTT 
CTGACCTCTC 
TGTCTGCCTC 
NGCTNTACTG 
CCCTACACCC 
GANCTCTGNG 
GNACCTCNGG 



CCATCACCAA 
AAGTTCAACA 
CAAGAACTCC 
TCAGGCCCGA 
TACCACCCTA 
GGAGCTNAGC 
TGGACAGGNA 
CCCACCACCA 
GACTCCATCC 
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(SEQ "Tncnnctgncc CTCTCCTGNT nccnttcacc 
51 CCTGCANTAN gnggannaca tgcnncnccc 

101 CCACNGAGNG NGTNCTGCAG GGTCTGCTNN 

151 AGTGTNGGCC NTCTGTACTC TGGCTGCAGA 

2 01 GAAGNATGGN GCAGCCACTG GANTGGATGC 

251 ANCCCAAAAG NCCTGGACTG NACAGNGAGC 

301 CANCTGACCA ANNNCATCNN NGAGCTGGGN 



NTCAACTTNA 

NGGNTCCAGG 

NNCCCNTNTT 

CTGACCTNNC 

CATCTGCANC 

NGCTNTACTG 

CCCTACACCC 



CCATCACCAA 

AAGTTCAACA 

CAAGAACNCC 

TCAGGNCNGA 

CACCNNCNTN 

GGAGCTNAGC 

TGGACAGGNA 



94 




351 CAGTCTCTAT GTCAATGGTT TCACCCATTG GAGCTCTGGG CTCACCACCA 
401 GCACTCCTTG GACTTCCACA GTTGACCTTG GAACCTCAGG GACTCCATCC 
451 CCCGTCCCCA GCCCCACA 
(SE0 ^rACTG^GGCC CTCTCCTGGT GCCATTCACC CTAAACTTCA CCATCACCAA 
51 CCTGCAGTAT GAGGAGGACA TGCATCGCCC TGGATCTAGG AAGTTCAACG 
10 1 CCACAGAGAG GGTCCTGCAG GGTCTGCTTA GTCCCATATT CAAGAACACC 
151 AGTGTTGGCC CTCTGTACTC TGGCTGCAGA CTGACCTTGC TCAGACCTGA 
20 1 GAAGCAGGAG GCAGCCACTG GAGTGGACAC CATCTGTACC CACCGCGTTG 
25 1 ATCCCATCGG ACCTGGACTG NACAGNGAGC NGCTNTACTG GGAGCTNAGC 
301 CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC TGGACAGGNA 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCN GANCTCTGNG CCCACCACCA 
4 „1 GCACTCCTGG GACCTCCACA GTGNACNTNG GNACCTCNGG GACTCCATCC 
4 51 TCCNTCCCCN GCCNCACA 
35 < S EQ ™™^ wcc CTCTCCTGNT NCCNTTCACC NTCAACTTNA CCATCACCAA 
51 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 
101 CCACNGAGNG NGTNCTGCAG GGTCTGCTNN NNCCCNTNTT CAAGAACNCC 
15 1 AGTGTNGGCC NTCTGTACTC TGGCTGCAGA CTGACCTNNC TCAGGNCNGA 
20 1 GAAGNATGGN GCAGCCACTG GANTGGATGC CATCTGCANC CACCNKCNTN 
45 25 1 ANCCCAAAAG NCCTGGACTG NACAGNGAGC NGCTNTACTG GGAGCTNAGC 
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TABLE 15 -continued 



CA125 Repeat Nucleotide Sequence 
( SE Q ID NO: 83 thru SEQ ID NO: 145) 
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301 CANCTGACCA ANNNC AT CNN NGAGCTGGGN 

3 51 CAGTCTCTAT GTCAATGGTT TCACCCATCG 

401 GCACTCCTTG GACTTCCACA GTTGACCTTG 

451 CCCGTCCCCA GCCCCACA 

(SEQ ^ACTGCTGGCC CTCTCCTGGT GCCATTCACC 
51 CCTGCAGTAT GAGGAGGACA TGCATCGCCC 
101 CCACGGAGAG GGTCCTTCAG GGTCTGCTTA 
151 AGTGTCAGCT CTCTGTACTC TGGTTGCAGA 
201 GAAGGATGGG GCAGCCACCA GAGTGGATGC 
251 ACCCCAAAAG CCCTGGACTG NACAGNGAGC 
301 CANCTGACCA ANNNCATCNN NGAGCTGGGN 
3 51 CAGTCTCTAT GTCAATGGTT TCACCCATCN 
401 GCACTCCTGG GACCTCCACA GTGNACNTNG 
451 TCCNTCCCCN GCCNCACA 



CCCTACACCC TGGACAGGNA 
GAGCTTTGGG CTCACCACCA 
GAACCTCAGG GACTCCATCC 



CTAAACTTCA 
TGGCTCCAGG 
CGCCCTTGTT 
CTGACCTTGC 
TGTCTGCACC 
NGCTNTACTG 
CCCTACACCC 
GANCTCTGNG 
GNACCTCNGG 



CCATCACCAA 
AAGTTCAACA 
CAGGAACACC 
TCAGGCCTGA 
CATCGTCCTG 
GGAGCTNAGC 
TGGACAGGNA 
CCCACCACCA 
GACTCCATCC 



40 



45 



(SEQ ^rNCNNCTGNCC CTCTCCTGNT NCCNTTCACC 
51 CCTGCANTAN GNGGANNACA TGCNNCNCCC 
101 CCACNGAGNG NGTNCTGCAG GGTCTGCTNN 
151 AGTGTNGGCC NTCTGTACTC TGGCTGCAGA 
201 GAAGNATGGN GCAGCCACTG GANTGGATGC 



NTCAACTTNA CCATCACCAA 
NGGNTCCAGG AAGTTCAACA 
NNCCCNTNTT CAAGAACNCC 
CTGACCTNNC TCAGGNCNGA 
CATCTGCANC CACCNNCNTN 



96 




TABLE 15-continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



251 ANCCCAAAAG NCCTGGACTG 

3 01 CANCTGACCA ANNNCATCNN 
351 CAGTCTCTAT GTCAATGGTT 

4 01 GCACTCCTGG GACCTCCACA 
4 51 CTCCCCAGCC CCACA 

(SEQ ID NO: 136) 

1 ACTGCTGGCC CTCTCCTGGT 

51 CCTGCAGTAT GGGGAGGACA 

101 CCACAGAGAG GGTCCTGCAG 

151 AGTGTTGGCC CTCTGTACTC 

201 GAAGGATGGA GCAGCCACTG 

2 51 ACCCCAAAAG CCCTGGACTG 

301 CANCTGACCA ANNNCATCNN 

351 CAGTCTCTAT GTCAATGGTT 

4 01 GCACTCCTGG GACCTCCACA 

451 TCCNTCCCCN GCCNCACA 

(SEQ ID NO: 137) 

1 NCNNCTGNCC CTCTCCTGNT 

51 CCTGCANTAN GNGGANNACA 
101 CCACNGAGNG NGTNCTGCAG 
151 AGTGTNGGCC NTCTGTACTC 



NACAGNGAGC NGCTNTACTG GGAGCTNAGC 
NGAGCTGGGN CCCTACACCC TGGACAGGNA 
TCACCCATTG GATCCCTGTG CCCACCAGCA 
GTGGACCTTG GGTCAGGGAC TCCATCCTCC 



ACCATTCACC 
TGGGTCACCC 
GGTCTGCTTG 
TGGCTGCAGA 
GAGTGGATGC 
NACAGNGAGC 
NGAGCTGGGN 
TCACCCATCN 
GTGNACNTNG 



CTCAACTTCA 

TGGCTCCAGG 

GTCCCATATT 

CTGACCTCTC 

CATCTGCATC 

NGCTNTACTG 

CCCTACACCC 

GANCTCTGNG 

GNACCTCNGG 



CCATCACCAA 

AAGTTCAACA 

C AAGAAC AC C 

TCAGGTCCGA 

CATCATCTTG 

GGAGCTNAGC 

TGGACAGGNA 

CCCACCACCA 

GACTCCATCC 



NCCNTTCACC NTCAACTTNA CCATCACCAA 
TGCNNCNCCC NGGNTCCAGG AAGTTCAACA 
GGTCTGCTNN NNCCCNTNTT CAAGAACNCC 
TGGCTGCAGA CTGACCTNNC TCAGGNCNGA 
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TABLE 15 -continued 
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CA125 Wat Kucleotla. Sequence 
II, KOi S3 thru SKI I" *>■ 1«> 

201 GAAGNATGGN GCAGCCACTG GANTGGATGC CATCTGCANC 
2 51 ANCCCAAAAG NCCTGGACTG NACAGNGAGC NGCTNTACTG 
301 CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCA GACCTTTGCG 
40 X GCACTCCTGG GACCTCCACA GTGGACCTTG GGACCTCAGG 
451 TCCCTCCCC AGCCCTACA 
<™ "^icTGCTGGCC CTCTCCTGGT GCCATTCACC CTCAACTTCA 



35 



40 



CACCNNCNTN 
GGAGCTNAGC 
TGGACAGGNA 
CCCAACACCA 
GACTCCATCC 



45 



S1 CCTGCAGTAC GAGGAGGACA TGCATCACCC AGGCTCCAGG 
101 CCACGGAGCG GGTCCTGCAG GGTCTGCTTG GTCCCATGTT 
151 AGTGTCGGCC TTCTGTACTC TGGCTGCAGA CTGACCTTGC 
20 1 GAAGAATGGG GCAGCCACCA GAGTGGATGC TGTCTGCACC 
25 1 ACCCCAAAAG CCCTGGACTG NACAGNGAGC NGCTNTACTG 
301 CANCTGACCA ANNNCATCNN NGAGCTGGGN CCCTACACCC 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCN GANCTCTGNG 
401 GCACTCCTGG GACCTCCACA GTGNACNTNG GNACCTCNGG 
451 TCCNTCCCCN GCCNCACA 

<SEQ ID r N CNNCTGNCC CTCTCCTGNT NCCNTTCACC NTCAACTTNA 
51 CCTGCANTAN GNGGANNACA TGCNNCNCCC NGGNTCCAGG 
10 1 CCACNGAGAG GGTTCTGCAG GGTCTGCTCA AGCCCTTGTT 



CCATCACCAA 
AAGTTCAACA 
CAAGAACACC 
TCAGGCCTGA 
CATCGTCCTG 
GGAGCTNAGC 
TGGACAGGNA 
CCCACCACCA 
GACTCCATCC 



CCATCACCAA 
AAGTTCAACA 
CAAGAGCACC 
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1S1 AGTGTTGGCC CTCTGTATTC TGGCTGCAGA CTGACCTTGC TCAGGCCTGA 
20 1 GAAGGACGGA GTAGCCACCA GAGTGGACGC CATCTGCACC CACCGCCCTG 
«1 ACCCCAAAAT CCCTGGGCTA GACAGACAGC AGCTATACTG GGAGCTGAGC 
,01 CAGCTGACCC ACAGCATCAC TGAGCTGGGA CCCTACACCC TGGATAGGGA 
" 351 CAGTCTCTAT GTCAATGGTT TCACCCAGCG GAGCTCTGTG CCCACCACCA 

4 „1 GCACTCCTGG GACTTTCACA GTACAGCCGG AAACCTCTGA GACTCCATCA 
4 51 TCCCTCCCTG GCCCCACA 
S (SE ° ^rGCCAciGGCC CTGTCCTGCT GCCATTCACC CTCAATTTTA CCATCACTAA 
51 CCTGCAGTAT GAGGAGGACA TGCATCGCCC TGGCTCCAGG AAGTTCAACA 
l 101 CCACGGAGAG GGTCCTTCAG GGTCTGCTTA TGCCCTTGTT CAAGAACACC 

1S1 AGTGTCAGCT CTCTGTACTC TGGTTGCAGA CTGACCTTGC TCAGGCCTGA 
2 „1 GAAGGATGGG GCAGCCACCA GAGTGGATGC TGTCTGCACC CATCGTCCTG 
2 51 ACCCCAAAAG CCCTGGACTG GACAGAGAGC GGCTGTACTG GAAGCTGAGC 
301 CAGCTGACCC ACGGCATCAC TGAGCTGGGC CCCTACACCC TGGACAGGCA 
351 CAGTCTCTAT GTCAATGGTT TCACCCATCA GAGCTCTATG ACGACCACCA 
401 GAACTCCTGA TACCTCCACA ATGCACCTGG CAACCTCGAG AACTCCAGCC 

40 

451 TCCCTGTCTG GACCTACG 
(SM ^ccGCCAGCC CTCTCCTGGT GCTATTCACA ATTAACTTCA CCATCACTAA 
51 CCTGCGGTAT GAGGAGAACA TGCATCACCC TGGCTCTAGA AAGTTTAACA 
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TABLE 15 -continued 

CA125 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



101 CCACGGAGAG 

151 AGTGTTGGCC 

2 01 GAAGGATGGG 
251 ATCCCAAAAG 
301 CAGCTAACCC 

3 51 CAGTCTCTAT 
401 GCATTCCTGG 

4 51 TCTAAACCTG 

(SEQ ID NO: 142) 

1 GCTGCCAGCC 

51 CCTGCGGTAT 

101 CCACGGAGAG 

151 AGTGTTGGCC 

2 01 AAAGGATGGG 

2 51 ACCCCAAAAG 

3 01 CAGCTGACCC 

3 51 CAGCCTCTTT 

4 01 GCACTCCTGG 
451 TCGATATTTG 



AGTCCTTCAG 
CTCTGTACTC 
GCAGCCACCA 
CCCTGGACTG 
ACAGCATCAC 
GTCAATGGTT 
GACCCCCACA 
GTCCCTCG 



GGTCTGCTCA 
TGGCTGCAGA 
AAGTGGATGC 
GACAGAGAGC 
TGAGCTGGGC 
TCACACAGCG 
GTGGACCTGG 



GGCCTGTGTT 

CTGACCTTGC 

CATCTGCACC 

AGCTATACTG 

CCCTACACCC 

GAGCTCTGTG 

GAACATCTGG 



CAAGAACACC 

TCAGGCCCAA 

TACCGCCCTG 

GGAGCTGAGC 

TGGACAGGGA 

CCCACCACTA 

GACTCCAGTT 



CTCTCCTGGT 
GAGGAGAACA 
GGTCCTTCAG 
CTCTGTACTC 
ACAGCCACTG 
CCCTAGGCTG 
ACAATATCAC 
GTCAATGGTT 
GACCCCCACA 
GCCCTTCA 



GCTATTCACT 

TGCAGCACCC 

GGCCTGCTCA 

TGGCTGCAGA 

GAGTGGATGC 

GACAGAGAGC 

TGAGCTGGGC 

TCACTCATCG 

GTGTATCTGG 



CTCAACTTCA 

TGGCTCCAGG 

GGTCCCTGTT 

CTGACTTTGC 

CATCTGCACC 

AGCTGTATTG 

CACTATGCCC 

GAGCTCTGTG 

GAGCATCTAA 



CCATCACCAA 

AAGTTCAACA 

CAAGAGCACC 

TCAGGCCTGA 

CACCACCCTG 

GGAGCTGAGC 

TGGACAACGA 

TCCACCACCA 

GACTCCAGCC 
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TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 
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(SEQ ID NO: 143) 

1 GCTGCCAGCC 

51 CCTGCGGTAT 

101 CAGAGAGGGT 

151 GTTGGCCCTC 

2 01 AGATGGGGAA 

2 51 CCACAGGCCC 

301 CTGACCCACA 

351 TCTCTATGTC 

(SEQ ID NO: 144) 

1 ACCGGGGTGG 



40 



ATCTCCTGAT 
GAGGAGAACA 
CCTTCAGGGC 
TGTACTCTGG 
GCCACCGGAG 
TGGGCTGGAC 
GCATCACTGA 
AATGGTTTCA 



ACTATTCACC 

TGTGGCCTGG 

CTGCTAAGGC 

CTCCAGGCTG 

TGGATGCCAT 

AGAGAGCAGC 

GCTGGGCCCC 

CCCATCGGAG 



CTCAACTTCA 

CTCCAGGAAG 

CCTTGTTCAA 

ACCTTGCTCA 

CTGCACCCAC 

TGTATTTGGA 

TACACACTGG 

CTCTGTACCC 



CCATCACTAA 

TTCAACACTA 

GAACACCAGT 

GGCCAGAGAA 

CGCCCTGACC 

GCTGAGCCAG 

ACAGGGACAG 

ACCACCAGC 



51 CCTGCGCTAC 

101 TCACAGACAA 

151 AGCCTGGGTG 

2 01 GAAGAACGGT 

2 51 CCCTCAGCGG 

3 01 CAGCAGACCC 

3 51 CAGCCTCTAC 

4 01 CAACTCCCAA 
451 ACA 



TCAGCGAGGA 
ATGGCGGACA 
CGTCATGAAG 
CACGGTACAC 
GCTGAGACAC 
CCCAGGTCTG 
ATGGCATCAC 
CTTAACGGTT 
GCCAGCCACC 



GCCATTCACA 
TGGGCCAACC 
CACCTGCTCA 
AGGCTGCAGG 
GGGTGGACCT 
CCTATCAAGC 
CCGGCTGGGC 
ACAATGAACC 
ACATTCCTGC 



CTGAACTTCA 
CGGCTCCCTC 
GTCCTTTGTT 
GTCATCGCAC 
CCTCTGCACC 
AGGTGTTCCA 
CCCTACTCTC 
TGGTCTAGAT 
CTCCTCTGTC 



CCATCAACAA 

AAGTTCAACA 

CCAGAGGAGC 

TAAGGTCTGT 

TACCTGCAGC 

TGAGCTGAGC 

TGGACAAAGA 

GAGCCTCCTA 

AGAAGCCACA 
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TABLE 15 -continued 



CA12 5 Repeat Nucleotide Sequence 
(SEQ ID NO: 83 thru SEQ ID NO: 145) 



(SEQ ID NO: 145) 

1 GCCATGGGGT 

51 TCTCCAGTAT 

101 CCGAGGGGGT 

151 ATGGGCCCCT 

2 01 GGATGGGGCA 

2 51 CTGTGGGCCC 
301 CTGACCCATG 

3 51 CCTCTTCATC 

4 01 AC C AG AT AAA 
451 ACATCCTCAG 



ACCACCTGAA 

TCACCAGATA 

CCTTCAGCAC 

TCTACTTGGG 

GCCACTGGTG 

CGGGCTGGAC 

GTGTCACCCA 

AATGGCTATG 

TTTCCACATT 

AGTAC 



GACCCTCACA 

TGGGCAAGGG 

CTGCTCAGAC 

TTGCCAACTG 

TGGACACCAC 

ATACAGCAGC 

ACTGGGCTTC 

CACCCCAGAA 

GTCAACTGGA 



CTCAACTTCA 

CTCAGCTACA 

CCTTGTTCCA 

ATCTCCCTCA 

CTGCACCTAC 

TTTACTGGGA 

TATGTCCTGG 

TTTATCAATC 

ACCTCAGTAA 



CCATCTCCAA 

TTCAACTCCA 

GAAGAGCAGC 

GGCCTGAGAA 

CACCCTGACC 

GCTGAGTCAG 

ACAGGGATAG 

CGGGGCGAGT 

TCCAGACCCC 
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TABLE 17 



Carboxy Terminal Nucleotide Sequence 
(SEQ ID NO: 147) 

1 GCCATGGGGT ACCACCTGAA GACCCTCACA CTCAACTTCA CCATCTCCAA 
10 51 TCTCCAGTAT TCACCAGATA TGGGCAAGGG CTCAGCTACA TTCAACTCCA 

101 CCGAGGGGGT CCTTCAGCAC CTGCTCAGAC CCTTGTTCCA GAAGAGCAGC 
151 ATGGGCCCCT TCTACTTGGG TTGCCAACTG ATCTCCCTCA GGCCTGAGAA 

2 01 GGATGGGGCA GCCACTGGTG TGGACACCAC CTGCACCTAC CACCCTGACC 
251 CTGTGGGCCC CGGGCTGGAC ATACAGCAGC TTTACTGGGA GCTGAGTCAG 

3 01 CTGACCCATG GTGTCACCCA ACTGGGCTTC TATGTCCTGG ACAGGGATAG 

3 51 CCTCTTCATC AATGGCTATG CACCCCAGAA TTTATCAATC CGGGGCGAGT 
401 AC C AGAT AAA TTTCCACATT GTCAACTGGA ACCTCAGTAA TCCAGACCCC 

4 51 ACATCCTCAG AGTACATCAC CCTGCTGAGG GACATCCAGG ACAAGGTCAC 

5 01 CACACTCTAC AAAGGCAGTC AACTACATGA CACATTCCGC TTCTGCCTGG 
551 TCACCAACTT GACGATGGAC TCCGTGTTGG TCACTGTCAA GGCATTGTTC 
601 TCCTCCAATT TGGACCCCAG CCTGGTGGAG CAAGTCTTTC TAGATAAGAC 
651 CCTGAATGCC TCATTCCATT GGCTGGGCTC CACCTACCAG TTGGTGGACA 
701 TCCATGTGAC AGAAATGGAG TCATCAGTTT ATCAACCAAC AAGCAGCTCC 
751 AGCACCCAGC ACTTCTACCT GAATTTCACC ATCACCAACC TACCATATTC 

40 801 CCAGGACAAA GCCCAGCCAG GCACCACCAA TTACCAGAGG AACAAAAGGA 

851 ATATTGAGGA TGCGCTCAAC CAACTCTTCC GAAACAGCAG CATCAAGAGT 
901 TATTTTTCTG ACTGTCAAGT TTCAACATTC AGGTCTGTCC CCAACAGGCA 

105 
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TABLE 17 -continued 



Nucleotide Sequence 
Carboxy Terminal Nuc le 

(SEQ ID NO: J.*'/ 



CCACACCGGG 
TAGACAGAGT 
ACCCAGCTGC 
GTATTCTCCC 
TCTGGGCTGT 
TGCCTGATCT 
AGAATACAAC 
ACCTGGAGGA 
1351 CCCCCAGCCA 
1401 TGGTCGGAAA 



951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



GTGGACTCCC 
TGCCATCTAT 
AGAACTTCAC 
AACAGAAATG 
CATCCTCATC 
GCGGTGTCCT 
GTCCAGCAAC 
TCTGCAATGA 
GGGTCCAAAG 
AAAAAAAAAA 



TGTGTAACTT 
GAGGAATTTC 
CCTGGACAGG 
AGCCCTTAAC 
GGCTTGGCAG 
GGTGACCACC 
AGTGCCCAGG 
CTGGAACTTG 
AAGCTTGGCT 



CTCGCCACTG 
TGCGGATGAC 
AGCAGTGTCC 
TGGGAATTCT 
GACTCCTGGG 
CGCCGGCGGA 
CTACTACCAG 
CCGGTGCCTG 
GGGGCAGAAA 



GCTCGGAGAG 
CCGGAATGGT 
TTGTGGATGG 
GACCTTCCCT 
ACTCATCACA 
AGAAGGAAGG 
TCACACCTAG 
GGGTGCCTTT 
TAAACCATAT 



AA 
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TABLE 18 



Carboxy Terminal Amino Ac :id 
(SEQ ID NO: 148) 



Sequence 



! AMGYHLKTLT 
51 MGPFYLGCQL. 
101 LTHGVTQLGF 
151 TSSEYITLLR 
201 SSNLDPSLVE 
251 STOHFYLNFT 
301 YFSDCQVSTF 
351 TQLQNFTLDR 
4 01 CLICGVLVTT 



LNFTISNLQY 
JSLRPEKDGA 



YVLDRDSLFI 
DIQDKVTTLY 
QVFLDKTLNA 
ITNLPYSQDK 
RSVPNRHHTG 
SSVLVDGYSP 
RRRKKEGEYN 



SPDMGKGSAT 
ATGVDTTC TY 
NGYAPQNLSI 
KGSQLHDTFR 
SFHWLGSTYQ 
AQPGTTNYQR 
VDSLCNFSPL 
NRNEPLTGNS 
VQQQCPGYYQ 



FNSTEGVLQH 
HPDPVGPGLD 
RGEYQINFHI 
FCLVTNLTMD 
LVDIHVTEME 
NKRNIEDALN 
ARRVDRVAIY 
DLPFWAVILI 



IiRPLFQKSS 
IQQLYWELSQ 
VNWNLSNPDP 
SVLVTVKALF 
SSVYQPTSSS 
QLFRNSSIKS 
EEFLRMTRNG 
OLAGLLGLIT 



SHLDLEDLQ 
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TABLE 19 A 
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15 



• on Pattern Predicted for the 

Serine/Threonine O-^^d'o"^ CAl25 Molecule 
Amino Terminal End os 

(SEQ ID NO: J-*'' 



2ffl ^assqpetids^ahpgte^ 

% SAISTTISPGIPGVLTSLVTSSGRDISATFPa IPTLTLS SGEPETTTSFITYSETHTSSAIP1 
ISSGTDSTTTFPTLTETPVEP^^ 

T.VTSLVPSSGTDTSTTFPTLSETPYEP* n ^ m . SSAmW ^FTO»lW8^"^ RreISK 



K; lvtslvpssgtdtsttfpt^™— ^ 
2S psipgvvtsqvtssatdtstaiptltpspge^ 

^ TSSESHSSPDATPVMATSPRTEASSAVLT^ 

5 1S^SS L " S ™V QTSOT PS S3 PSPT 

TABLE 19B 



5'y 
vi 



55 m TSTS 



TTT TTTT. . -TT TT. 



TABLE 19B- continued 



'. Talvcosylation Pattern Predicted for the 
■^^ST^FE* of the CM25 Molecule 



Amino 

1600 
1680 
1760 
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TABLE 20 



10 



15 



2& 



Peptide 3 (SEQ ID NO: 156) and Peptide 4 (SEQ ID NO. 



ft 



IS 



40 



45 



50 



.TCCATGGGCCACACAGAGCCTGGCCCT ^ 



- + 



ATGAGAGGATCGCATCACCATCACCATCACGGA' 

1 tactctcctIgcg^ 



MRGSHHHHHHG 
CTCCTGATACCATTCACTTTCAACTTTACCATCAi 



SMGHTEPG 
t 

.CCAACCTGCATTATGAGGAAAACATG 



61 --- 

GAGGAC 

L L 



- - + - 



- + 



-+ 120 



TATGGTAAGTGAAAGTTGAAATGGTAGTGGTTGGACGTAATACTCCTTTTGTAC 

ITNLHYEENM 



F T F N F T 



.GGGTTCTGCAGGGTCTGCTCAAG ^ 



CAACACCCTGGTTCCAGGAAGTTCAACACCACGGAGA( + 

GGTGCCTCTCCCAAGACGTCCCAGACGAGTTC 

F N T T E R V L Q G L L K 



121 GTTGTGGGACCAAGGTCCTTCAAGTTGTi 



Q H P G S R K 



CTGTACTCTGGCTGCAGACTGACCTTGCTC ^ 



CCCTTGTTCAAGAACACCAGTGTTGGCCCT 
181 GGGAACAAGTTCTTGTGGTCACAACCGGG^ 



L F K N T S 



V G P L Y S 



G C R L T L L 



AGACCTGAGAAGCATGAGGCAGCCACTGGAGTGGACACCATCTGTACCCACCGCGTTGAT ^ 

241 TCTGGACTCTTCGTACTCCGTCGGTGAC^ 

R P E K H E A A T G V D T I C T H R V D - 

CCCATCGGACCTGGACTGGACAGAGAG 



\GCGGCTATACTGGGAGCTGAGCCAGCTGACCAAC ^ ^ ^ 
PIGPGLDRE j> T. Y W E 1, S U L Jl N - 



V3GGACAGTCTCTATGTCAATGGCTTC 



AGCATCACAGAGCTGGGACCCTACACCCTGGACAG + + ^ 



no 
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TABLE 20 (continued) 

" , . p*i25 Repeat Showing Peptides 

^otid. -4 »i»o A=ia Immune ° f 

T L D R_.D_S_.L_ Y. V N G F - 

!|^__^ 4 eo 

421 TTGGGAGCCTCGAGACACGGTTGGTGGTC 

ACCTCTGGGACTCCATCCTCCCTGCCT ^ 
481 TGGAGACCCTGAGGTAGGAGGGACGGA 

T S G T P S S L P - 



3ft 



(SEQ ID NO: 154) L s Q L 

Peptide 1 * 



(SEQ 



ID NO: 155) 



Peptide 2 



T L 



D R D S L Y V 



40 (SEQ ID NO: 156) 
Peptide 3 



V L 



Q G L L K P L 



(SEQ ID NO: 157) NSlXEL 
Peptide 4 « 



45 Peptide 
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TABLE 21 



CA125 Protein Sequence 
(SEQ ID NO: 162) 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



601 

651 

701 

751 
801 
851 
901 
951 

1001 

1051 

1101 

1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 



— T SS SSSSS EK-s s-EE 

=i lis SEES ss -sss 

SSSSS 5- SS SSSS. — « 

MWLTHPM miPIWW ^PKTIMLV TBPE 

SPG6ED SS SIS laaktsttmk 'Sevpgwt 

JPTSTISP6V |^™°" TPSMTTSHGA E= SS »""" HPT PTVSPG 

sssss sssss. sss s ss 

S««3 ISS GMVTSLVTSS W= WJ ^ 

ss «ssss E£ «sss 

XSATFPTVPE SPHES^TA TSQVT SSGTDTSITI ^ 

ssss s s =s sees s — 

ETPYEPETTA ^""f™ vPSSGTDTST TFPTLSETPY ^ 

mmmmm 
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TABLE 21 - continued 



CA125 Protein Sequence 
(SEQ ID NO: 162) 



1651 



2151 
2201 
2251 

"ft 2301 



2351 
2401 



[D 2451 



AT VPFMVPFTLN 

FTITNLQYEE D «RHPGSRKF K f= - = 
170 1 QT.PPKKDSSA MAVDAI CTHR PDPEDLGLDR ERLYWELbN 
1Q 1751 TS =D VOTSGTPSSS PS^^ 

1801 MPFTLNFTIT NLQYEEDMRR TGSRKFNI WELSKLTNDI 

1851 S^OTLTIJjRX-SlS^-^^S^-^^yg^g^p qtSTVDLRTS GTPSSLSSPT 

19 01 EELGPYTLDR NSLYVNGFTH QSSVSTTSTP ^ LQGLLGPIFK 

195 1 IMAAGPLLVP FTLNFTITNL Q^EDMGHPG ^FNT QLNRERLYWE 

15 2001 NTSVGPLYSG ?J^1^-^^ STVDLGTSGT 

2051 LSQLTNGIKE ^YTLDRNS ™THRT ^RPGSRKF NTTERVLQTL 

2101 ssss ss^»™ s— 

a „,i "vo J— SSSS- 

•:.| 2551 NTSTPGTSTV DLGTSGTPSS LPSPTbAU PEKNGAATGM p 

/I 2601 HPGSRKFNTT ERVLQGLLGP MFKNTSVGLL mshTJN( ^ * 

W 2651 DAICSHRLDP KSPGLNREQL ™ELSQLTHG ™P TLNFTITNLQ S 

CO 2701 Sr^vaptst pgtstvdlgt sgtpsslpsp ll^ SG ^si3sm a 

30 2751 YGEDMRHPGS RKFNTTERVL Q^PLFKN SSVGPL ^-^^^ 
p 2801 GAATGVDAI C_THHLNPQSPG ™QLYWQL g^TNG PLLVPFTLNF 

l - 2851 yvngftw^glttstpwts tvdlgtsgtp |^psp PLYSG crlts 

W 2901 TITNLQYEED MHRPGSRKFN ATERVLQGLL S P ™^ HNITE I5p7s 
rii 2 9 9 51 aPSHSAALSH^P NPKRPGLDRE QLYWEL QLT HNITE^P^ 
33 3001 LDRDSLYVNG FTHQNSVPTT STPGTSTVYW JTTGT KNTSVGPLYS 

40 mi sss^^^^E SSSSK ssss 

3351 LPGHTAPVPL LIPFTLNFTI ™LHYEENMQ ^RKFN iGPGLDREXL 

3401 LFKNTSVGPL ^SGOlin^P^^^^^; J^g,^ 

34 51 YWELSXLTXX IXELGPYXLD ^LYVNGFX RKFNTTERVL 

45 3501 SGTPXXXPXX TSAGPLLVPF TLNFTITNLQ SHRLDPKSPG 

3 5 51 QGLLGPMFKN TSVGLLYSG C RLT ^^^ ^pfthrss VAPTSTPGTS 

3601 LDREQLYWEL SQLTHGIKEL GPYTLDRNSL ™HRSS 

3651 TVDLGTSGTP SSLPSPTTAV « lRSEKDGAAT GVDAICTHHL 

3701 TTERVLQGLL GPLFKNSSVG Pff SGTFT.TS yTH RSSGLTT 

50 3751 NPQSPGLDRE <*™*g ££SK P^NFTITN LQYEEDMHRP 

1= tSS SSSSS =T! NCTITNLQYE 



3851 
3901 
3951 



55 
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TABLE 21 - continued 

CA125 Protein Sequence 
(SEQ ID NO: 162) 



4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4601 
4651 
4701 
4751 
4801 
4851 
4901 
4951 
5001 
5051 
5101 
5151 
5201 
5201 



5251 
5301 
5351 
5401 
5451 
5501 
5551 
5601 
5651 
5701 
5751 



EDMRRTGSRK 
ATGVDAICTH 
NGFTHQSSVS 
TITNLXYEEX 
LRPEKDGAML 
LDRVSLYVNG 
XPFTLNFTIT 
SGCRLASLRP, 
QELGPYTLDR 
TAGPLLVPFT 
SVGPLYSGCR 
QLTNSITELG 
SLPGHTAPVP 
PLFKSTSVGP 
LYWELSQLTN 
TSGTPASLPG 
LQGLLKPLFK 
GLDRERLYWK 
STMHLATSRT 
NTTERVLQGL 
PDPKSPGLDR 
TSIPGTSAVH 
PGSRKFNTTE 
TICTHRLDPL 
RTSVPTTSTP 
EEXMXXPGSR 
AATGVDAICT 
VNGFTHWIPV 
TNLKYEEDMH 
SEKDGAATGV_ 
RNSLYVNGFT 
TLNFTITNLX 
RLTLLRXEKX_ 
GPYXLDRXSL 
LLVPFTLNFT 
LYSGCRLTSL 
GIKELGPYTL 



FNTMESVLQG 
RLDPKSPGLN 
TTSTPGTSTV 
MXXPGSRKFN 
RVDAACTYRP 

"fnprssvptt 

NLXYEEXMXX 
JKDSSAMAVD 
NSLYVNGFTH 
LNFTITNLQY 
^LTLLRPEKQE__ 
PYTLDRDSLY 
LLIPFTLNFT 
LYSGCRLTLL 
SVTELGPYTL 
HTAPGPLLVP 
NTSVSSLYSG 
LSQLTHGITE 
PASLSGPTTA 
LRPVFKNTSV 
EQLYWELSQL 
LETSGTPASL 
RVLQGLLKPL 
NPGLDREQLY 
GTSTVDLGTS 
KFNTTERVLQ 
HRLDPKSPGV 
PTSSTPGTST 
CPGSRKFNTT 
^AICTHRLDP 
HQTSAPNTST 
YEEXMXXPGS 
XAAMVDXXC 
YVNGFTHWIP 
ITNLKYEEDM 
_RSEKDGAATG 
DRNSLYVNGF 



LLKPLFKNTS 
REQLYWELSK 
DLRTSGTPSS 

ttervlqgll 

DPKSPGLDRE 
STPGTSTVHL 
PGSRKFNTTE 
^AICTHRPDPE 
RSSFLTTSTP 
EEDMHRPGSR 
_^TGVDTICT 
VNGFNPWSSV 



VG pLYSGCRkTLLRPK!S^ 



ITDLHYEENM 
^PEKHGAATG 
DRDSLYVNGF 



FTLNFTITNL 
CRLTLLRPEK_ 
LGPYTLDRHS 
SPLLVLFTIN 
GPLYSGCRLT 
THSITELGPY 
PGHTAPGPLL 
FKSTSVGPLY 
WELSKLTRGI 
GTPFSLPSPA 
TLLGPMFKNT 
DREQLYWELS 
VDLGSGTPSL 
ERVLQSLLGP 
KSPGVDREQL 
PGTSTVDLGT 
RKFNTTERVL 
XXXXDPXXPG 
VPTSSTPGTS 
HCPGSRKFNT 
VDAICTHRVD 
THQTSAPNTS 



LTNDIEELGP 
LSSPTIMXXX 
RPLFKNTSVS 
QLYWELSQLT 
ATSGTPSSLP 
RVLQGLLKPL 
DLGLDRERLY 
WTSTVDLGTS 
RFNTTERVLQ 
HRVDPIGPGL 
PTTSTPGTST 
QHPGSRKFNT 
VDAICTLRLD 
THRSSVPTTS 



QYEEDMRHPG 
DGAATRVDAV_ 
LYVNGFTHQS 
FTITNQRYEE 
_LLRPKKDGAA 
TQDRDSLYVN 
VPFTLNFTIT 
SGCRLTLLRP_ 
IELGPYLLDR 
XXXPLLXPFT 
SVGLLYSGCR 
QLTNGIKELG 
PSSPTTAGPL 
MFKNTSVGPL 
YWELSQLTNG 
SGTPSSLPSP 
QGLLXPXFKX 
LDREXLYWEL 
TVDLGSGTPS 
TERVLQSLLG 
PKSPGVDREQ 
TPGTSTVDLG 



YTLDRNSLYV 
PLLXPFTLNF 
SLYSGCRLTL 
HSITELGPYT 
GHTXX XPLL 
FRNSSLEYLY 
WELSNLTNGI 
GTPSPVPSPT 
GLLTPLFKNT 
DRERLYWELS 
VHLATSGTPS 
TERVLQGLLK 
PTGPGLDRER 
IPGTSAVHLE 
SRKFSTTERV 

_CTHRPDPKSP 
SMTTTRTPDT 
NMHHPGSRKF 
TKVDAICTYR 

"gfthrssvpt 

NLQYEEDMRH 
^EKRGAATGVD 
GSLYVNGFTH 
LNFTITNLXY 
jjTLLRSEKDG 
PYTLDRNSLY 
LVPFTLNFTI 
YSGCRLTLLR 
IKELGPYTLD 
TXXXPLLXPF 
TSVGXLYSGC 
SXLTXXIXEL 
SLPSPTTAGP 
PMFKNTSVGP 
LYWELSQLTN 
TSGTPSSLPS 



114 



TABLE 21 -continued 

CAU5 Protein Sequence 
(SEQIDN0:162) 



10 

15 

D 

m 



5801 

5851 

5901 

5951 

6001 

6051 

6101 

6151 

6201 

6251 

6301 

6351 

6401 

6451 



fli 



33 



40 



45 



50 



55 



" cdvfNTTERV LQGLLGPMFK 

PXXXPXXTXX XPLLXPFTL^ sjffl£ SSAMAVP|||^ tSTPGTSTVD 
VGTSGTPSSS ^TAGP SGCRLTSLRS^S^f^l^vpTTSTP 

5SSSS S3S« SSSSS ESSE — S 

Sii sssss ss 
— i ssss ssss s ss ss 

^Si^g-^^rSSHX™™ S2££5 GLLXPXFKXT 
XELGPYXLDR EE XMXXPGSR K ™™VLU DREXLYWELS 

SSS9L^2?K X M ^LETSGTPA 

LQGLLXPXFK ^^^^ 
GLDREXLYWE ^SXLTXXIA LN FTITNLHYEE N ^ 

NTTERVLQGL ^GPMFKN iXXIXELGPY XLDRXSLYVN ^ 
LDPKSPGLDR EXLYWELSXL pLL XPFTLNFTIT NLX 

TSTPGTSXVX LXTSGTPXXX y SGCW i l^Ri|g X ^ 5iS 

PGSRKFNTTE ^LQGLLXPX j XELGPYXLDR XSLYVN 

€i Ss ssss Si isi 
E ^H1 sssss ss ss s -ss 

VNGFXXXXXX KpNT TERVLQGLLX PXF *^ YIXELGPYXL 

ITNLXYEEXM XXPGSRKFN1 x LYWELSXLTX XIX 

ffiBATX_^XXXXD P HLA TSGTPSSLPG « 

SSSS «S SES SSS — 

XP 115 



6601 
6651 
6701 
6751 
6801 
6851 
6901 
6951 
7001 
7051 
7101 
7151 
7201 
7251 
7301 
7351 
7401 
7451 
7501 
7551 
7601 
7651 
7701 
7751 
7801 
7851 
7901 
7951 
8001 
8051 
8101 
8151 
8201 
8251 
8301 



TABLE 21 -continued 

CA125 Protein Sequence 
(SEQ ID NO: 162) 



10 



15 



f8 

I i t 



ft 



3S 



40 



45 



50 



55 



8351 
8401 
8451 
8501 
8551 
8601 
8651 
8701 
8751 
8801 
8851 
8901 
8951 
9001 
9051 
9101 
9151 
9201 
9251 
9301 
9351 
9401 
9451 
9501 
9551 
9601 
9651 
9701 
9751 
9801 
9851 
9901 
9951 
10001 
10051 
10101 
10151 
10201 
10251 
10301 
10351 
10451 
10501 
10551 
10601 
10651 
10701 
10751 
10801 
10851 
10901 



SSSSE ^H^^SESS SEES 

ii S SSSS^iss 5S55S 

isi JSSES i» = »sss 

lis s=s= 3^gs 

SEE !S= SSg 

cxxxxdpxxp gldrexlywe lsxltxxixe wp ptitNLQYEE 

iGLTTSTPWT STVDLGTSGT PSWPSPTTA ^ cgLTU^PETOGAA 
Spgsrkf NTTERVLQGL LTPLFRNTSV ssly 

SScTHR PDPKSPGLDR EXLYWELSXL TXXIX xPFTLNFTIT 
iSxXX TSTPGTSXVX Bjg ^SVGXLY SCCR^ 

nlxyeexmxx pgsrkfntte hvdqglwpx xelgpyxldr 

FKXXAATXVD XXCXXXXDPX XPGL^XLY " AG PLLVPFTL 

S^^^n^vPTSSTP GTSTVDLGSG ™ssi* VGPLYS GCRL 

SSSyG EDMGHPGSRK LTXX IXE L GP 

SIS^GA^TGVDAICIH HLDPKSPGLD ^ XPXXTXXXPL 

ISII^liGHraXX XTSTPGTSXV ^XTb XFKXTSV GXL 

SS5SSS TNLXYEEXMX YWELS x L TXX 

S^TUAMna*»S5L^S?! ^tstvdlgt sgtpsslpsp 

IX^HD^SlSvNGFrHQ^^ KKFNTTERVL QGLLGPMFKN 

tsagSpf tlnftitnlq ^eedmhhpgs RKfn LDREX lywel 

^JSaGCMffiiBgeS^^^ XXXTSTPGTS XVXLXTSGTP 

G pSS SSS; — - qptGPGLDRE 
SJSSvG PLYSGCRLTL^EKHS^^^t SI p G TSAVHL 

"Eg S55SS F £ = — 

SSS S5SSS SisSssES; 
S= sssss SSks^SgS 

SoSPGLN REQLYWELSQ LTHGITELGP Y pTI tnlXYEEXMX 

TTSTPGTSIV NLGTSGIPPS LPE ™*^ ^tllR^EKDGWTRV 

vpSSfntt ervlqgllkp lfkstsvgpl "^^s-msiSvngft 

SfcTHRPDP KIPGLDRQQL ^ELSQLTHS TLNFTITNLQ 
HiJpTTST PGTFTVQPET SETPSSLPGP ™* LYSGC3L TLLRPEro 

SeSgs rkfnttervl sSSgitel^p™^ 
SSSdwc thrpdpkspg »lywkl sqlt pllvlf 

f^S^THOSS MTTTRTPDTS TMHLATSRTP ^ Q pLYSG CRLTL 

SSeh mhhpgsrkfn ttervlqgll Rpw hsitelgpyt 
tppSaat kvdaictyrp dpkspgldre Qjwj^ gpsaaspllv 
^S«vptt «2gSS SSSlrslf kstsvgplys 
i^S^SLS™=S PRLDREQIjYW elsqlthnit 
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TABLE 21 - com 



tinued 



CA125 Protein Sequence 
(SEQID NO: 162) 



10 



15 



10951 

11001 

11051 

11101 

11151 

11201 

11251 

11301 

11351 

11401 



ELGHYALDND 
ASHLLILFTL 

GPLYSGSRVL 
THSITELGPY 

R YMADMGQPG 

NGAETRVDUl 
LYLNGYNEPG 

lqyspdmgkg 

DGAATGVDTT 
LFINGYAPQN 



SLFVNGFTHR S*2ESS 

TdSpttpkp attflpplse 

cSsTEGV LQHLLRPLFQ 

Sdpvgp gldiqqlywe 



TPTVYLGASK TPASIFGPSA 
SrVLQGL LRPLFKNTSV 

pStcpgldr eqlylelsql 

TSTGVVSEEP FTLNFTINNL 

SSgar^tg crvi^RSH 

^nOTHGITR LGPVSLDKDS 
SpFYLG CQLISLRPEJS 

PDPTSSEY 



11451 
11501 
11551 
11601 
11651 
11701 



TLYKGSQLHD 
LNASFHWLGS 
QDKAQPGTTN 
HTGVDSLCNF 
YSPNRNEPIjT 
EYNVQQQCPG 



sssss ss 

YYQSHIjDIiED LQ 



IT LLRDIQDKVT 
MJSSWBPS ^VEQVFLDKT 
^SSSTQHFYL NFTITNLPYS 
r^YFSDCQV stfrsvpnrh 



C T D 
a e o 
r r m 
b m a 

o i i 
x n n 

y a 



2$ 
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TABLE 22 



CA125 Kepeat ******** 
(SE Q ID NO: 307) 



10 



15 

□ 

va 



ru 



! ACTGCTGGCC 
51 CCTGCAGTAT 
101 CCACAGAGAG 
15 1 AGTGTTGGCC 
201 GAAGGATGGA 
251 ACCCCAAAAG 
3 01 CGACTGACCA 
351 CAGTCTCTAT 
401 GCACTCCTGG 
451 TCCCTCCCAA 



CTCTCCTGGT 
GAGGAGGACA 
GGTCCTGCAG 
CTCTGTACTC 
GCAGCCACTG 
CCCTGGACTC 
ATGGCATCAA 
GTCAATGGTT 
GACCTCCACA 
GCCCCGCA 



GCCATTCACC CTCAACTTCA 
TGCATCGCCC TGGATCTAGG 
GGTCTGCTTA GTCCCATATT 
TGGCTGCAGA CTGACCTCTC 
GAGTGGATGC CATCTGCATC 

aacagagagc ggctgtactg 

AGAGCTGGGC CCCTACACCC 

tcacccatcg gacctctgtg 

GTGGACCTTG GAACCTCAGG 



CCATCACCAA 
AAGTTCAACA 
CAAGAACACC 
TCAGGTCTGA 
CATCATCTTG 
GGAGCTGAGC 
TGGACAGGAA 
CCCACCACCA 
GACTCCATTC 



TABLE 23 



30 



35 



40 



an — Aoid 

(SEQ ID NOs 308) 



«PPTSR KFNTTERVLQ GLLSPIFKNT 

, TA „ T — K hhldpkspgl m „, s 

101 RLTNGIKELG PV— — PTTSTP 

15 1 SLPSPA 
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